The York-Toronto-Helsinki Parsed Corpus
of Old English Prose
Filenames
All filenames begin with co following Helsinki practice. Texts that
were included in the Helsinki Corpus have the same filename. In providing
filenames for the texts not in the Helskinki corpus we have relaxed the 8
character rule slightly since all operating systems can now handle longer
filenames. Other conventions followed when naming files are:
-
- Texts from the Helsinki Corpus have the Helsinki period attached as an
extension following PPCME2 practice. Thus, coaelhom.o3 is a text
from Helsinki period 3. When Helsinki provides two periods, the first being
period of composition, and second, period of manuscript, both periods are
included in the filename. coadrian.o34 is a text composed in period
3 for which the manuscript was written in period 4. coaugust, on the
other hand, is a text not included in the Helskinki corpus, so has no
extension.
- Some of the texts in the corpus are included in more than one
manuscript version. The texts involved have the same filename but end with
a capital letter, different in each case, indicating the manuscript. This
letter is in most cases the traditional letter name for the manuscript
(e.g., cochronA is the A manuscript of the Anglo-Saxon Chronicle,
the others being designated cochronC, cochronD, cochronE), but when
no traditional letter name exists, it is just a convenient letter (e.g.,
the names comargaC and comargaT do not reflect a traditional
letter name). Note that when searching it is important to consider whether
to include all manuscript versions in the search, since this may result in
unintended duplicate data.
- Numbers ending otherwise identical filenames, on the other hand, have
no such significance. They are either part of the name of the text (e.g.,
cocathom1 Catholic Homilies I), indicate parts (e.g., comart1,
comart2, comart3), or number two unrelated Old English translations of
the same source (e.g., coeluc1, coeluc2).
- All parsed files have the final extension .psd. Part-of-speech tagged
files have the same names with the extension .pos.
List of texts by filename