Each token in the corpus
has a unique ID at the end, which includes the filename, DOE short title,
some way of locating the token in the printed text (usually following DOE
practice) and lastly, a token number unique to that file. This information
is contained in a node (that is, a pair of parentheses with a label on the
opening parenthesis) labelled ID. The ID node itself is contained within
the wrapper, the outermost (unlabelled) pair of parentheses.
( (CODE <T03010000800,25>) (IP-MAT (NP-NOM (NUM^N An) (N^N woruldcynincg)) (HVPI h+af+d) (NP-ACC (NP (Q fela) (NP-GEN (N^G +tegna))) (CONJP (CONJ and) (NP-ACC (ADJ^A mislice) (N^A wicneras)))) (. ;)) (ID copreflives,+ALS_[Pref]:25.14)) <-- ID node ( (PP (PP (P For) (NP-DAT (Q^D miclum) (N^D gesceade))) (, .) (CONJP (CONJ &) (ADV eac) (PP (P for) (NP (N neode)))) (. .)) (ID cocathom1,+ACHom_I,_13:283.79.2424)) <-- ID nodeIn the two tokens above, the IDs are decomposed as follows:
filename copreflives cocathom1 short title +ALS_[Pref] +ACHom_I,_13 line number 25 page 283, line 79 token number 14 2424
The text of the corpus is that of the Dictionary of Old English Project
(Toronto). The following modifications have been made.
eth +d ash +a Eth +D Ash +A thorn +t e-cedilla +e Thorn +T barred t +tt barred l vel
The title information has been reduced to one code that contains the number of the file, the short title, and the Cameron number. Parentheses in the titles have been altered to square brackets.
The identifiers have been slightly altered in format, as follows:<T02040_+ACHom_I_[Pref]_B1.1.1>
<s id="T02040000100" n="174.44"> <-- DOE identifier <T02040000100,174.44> <-- YCOE equivalent
Text markup is reduced to a minimum so as not to interfere with the
annotation; what there is is in html-type format (<markup>
...</markup>). All text markup is additionally enclosed within a node
labelled CODE to differentiate it from the text. The following codes can be
found in the corpus:
(CODE <paren>) ... (CODE </paren>)
(CODE <COM:text_missing>) (CODE <TEXT:for+tan+te>) (CODE <MS:secga+d>)
( (CODE <INTERPOLATION>)) ... ( (CODE </INTERPOLATION>))
Emendations that are made either by the editor of the text or by the DOE
Corpus (labelled <corr>>...</corr> in the DOE texts) are
marked as emendations with the emendation symbol $ on the beginning
of every word or partial word emended. No comment is added.
(NP-NOM-VOC (PRO^N +tu) (NP-NOM-PRN (ADJ^N $halige) (N^N $modor))) <-- emended textEmendations made to the text by the YCOE team fall into the following categories.
(NODE (IP-MAT-SPE (NEG Ne) (VBPI $lyfast) <-- LYFASTU separated in order (NP-NOM (PRO^N $tu)) to allow annotation of subject (CODE <TEXT:lyfastu>) <-- TEXT comment (PP (P o+d) (NP-ACC (N^A +afen)))) (ID coaelive,+ALS_[Basil]:583.870))
( (CODE <T06560178800,47.363.3>) (IP-MAT-SPE (CONJ Ond) (ADVP (ADV for+d+am)) (NP-GEN (PRO^G min)) (NP-NOM (MAN^N monn)) (VBPI $eht) (CODE <TEXT:eft;eht_from_ms.Cotton>) (CP-ADV-SPE (C +de) (IP-SUB-SPE (NP-NOM (PRO^N ic)) (VBP bodige) (PP (P ymb) (NP-ACC (D^A +done) (N^A tohopan) (NP-GEN (NP-GEN (ADJ^G deadra) (N^G monna)) (N^G +arestes)))))) (. .)) (ID cocura,CP:47.363.3.2456))
( (CODE <T04890014900,289>) (IP-MAT-SPE (CONJ And) (ADVP-TMP (ADV^T nu)) (PTP-DAT-ABS (VBN^D geendodum) (NP-DAT-SBJ (N^D ryne))) (NP (PRO me)) (BEPI is) (VBN gehealden) (NP-NOM (NP-GEN (N^G rihtwisnysse)) (CODE <TEXT:weg;emendation_suggested_by_ed.>) (N^N wuldorbeah)) (. .)) (ID coeuphr,LS_7_[Euphr]:289.297)) ( (CODE <T03910012500,107>) (IP-MAT (NEG Ne) (VBDI cw+a+d) (NP-NOM (PRO^N he)) (ADVP (NEG+ADV na) (ADV lichamlice) (CODE <TEXT:ne>) (CONJ ac) (ADV gastlice)) (. .)) (ID colwstan2,+ALet_3_[Wulfstan_2]:107.146))
(NODE (PP (P in) (NP-DAT (D^D +d+are) (N^D stowe) (CP-REL (WNP-NOM-1 0) (C $+te) <-- replacement in text (CODE <MS:+ta>) (IP-SUB (NP-NOM *T*-1) (VBD hatte) (NP-NOM-PRD (NR^N Maiuma)))))) (ID comart3,Mart_5_[Kotzor]:Oc21,A.35.2046)) (NP-NOM (NP-GEN (D^G +d+as) (N^G martyres) (NP-GEN-PRN *ICH*-1) (CP-REL *ICH*-3)) (CODE <MS:tid>) <-- omitted from text (N^N +trowung) (NP-GEN-PRN-1 (NR Sancti) (NR^G Genesi))) (IP-INF (NP-ACC-SBJ (ADJ^A hwite) (N^A culfran)) (PP (P of) (NP-DAT (N^D heofonum))) (VB $cuman) <-- lacking in the ms. (CODE <MS:lacks_emendation>))
All changes that were made to the DOE text (apart from the correction of
clear errors) are accompanied by a comment. There are three types of
(CODE <COM:text_missing>) (CODE <COM:conjectured_text_omitted>) (CODE <COM:ofercuman_glossed_by_onbegan>) (CODE <COM:emendation_from_ms.U>)