The York-Toronto-Helsinki Parsed Corpus
of Old English Prose

Non-linguistic annotations within the text

Token IDs
The text
Text Markup


Token IDs

Each token in the corpus has a unique ID at the end, which includes the filename, DOE short title, some way of locating the token in the printed text (usually following DOE practice) and lastly, a token number unique to that file. This information is contained in a node (that is, a pair of parentheses with a label on the opening parenthesis) labelled ID. The ID node itself is contained within the wrapper, the outermost (unlabelled) pair of parentheses.
( (CODE <T03010000800,25>)
  (IP-MAT (NP-NOM (NUM^N An) (N^N woruldcynincg))
          (HVPI h+af+d)
          (NP-ACC (NP (Q fela)
                      (NP-GEN (N^G +tegna)))
                  (CONJP (CONJ and)
                         (NP-ACC (ADJ^A mislice) (N^A wicneras))))
          (. ;)) 
  (ID copreflives,+ALS_[Pref]:25.14)) <-- ID node

( (PP (PP (P For) 
          (NP-DAT (Q^D miclum) (N^D gesceade)))
      (, .)
      (CONJP (CONJ &) (ADV eac)
             (PP (P for)
                 (NP (N neode))))
      (. .)) 
  (ID cocathom1,+ACHom_I,_13:283.79.2424)) <-- ID node

In the two tokens above, the IDs are decomposed as follows:
filename               copreflives        cocathom1
short title            +ALS_[Pref]	  +ACHom_I,_13
line number            25		  page 283, line 79
token number           14		  2424

The text

The text of the corpus is that of the Dictionary of Old English Project (Toronto). The following modifications have been made.

Text Markup

Text markup is reduced to a minimum so as not to interfere with the annotation; what there is is in html-type format (<markup> ...</markup>). All text markup is additionally enclosed within a node labelled CODE to differentiate it from the text. The following codes can be found in the corpus:


Emendations that are made either by the editor of the text or by the DOE Corpus (labelled <corr>>...</corr> in the DOE texts) are marked as emendations with the emendation symbol $ on the beginning of every word or partial word emended. No comment is added.
(NP-NOM-VOC (PRO^N +tu) 
	    (NP-NOM-PRN (ADJ^N $halige) (N^N $modor))) <-- emended text

Emendations made to the text by the YCOE team fall into the following categories.


All changes that were made to the DOE text (apart from the correction of clear errors) are accompanied by a comment. There are three types of comment: