The Brooklyn Corpus: Syntactic Annotations, File Formats, and Search Tools

The syntactic annotations of the Brooklyn Corpus enable the users to pose and answer questions about word order, constituent order, abstract structure, and syntactic and morphological characteristics of the texts in the corpus. The annotations are general-purpose and as theory-neutral as possible, while still incorporating the insights of modern linguistic theory, and they can be used by scholars with widely varying research interests. But it must be emphasized that the annotations should in no way be regarded as the implementation of a structural analysis, and that the annotation schemes were developed primarily as a tool for the investigation of the structure of earlier stages of English.

The syntactic annotations mark constituents, both clausal and non-clausal, by labelled brackets, with some relations marked by empty categories. The structure assigned to a sentence by the labelled bracketing can be quite complex, but it is not a complete syntactic analysis: the function of the bracketing is not to assign a structure to Old English sentences but rather to facilitate searches.

Below is an example of an annotated sentence from the Brooklyn Corpus, with its gloss:

( [rt +ta/RT ] [vt com/VT ] [nn sum/JJN wydewe/NNN , 1[L [wh-1 seo/PDN ] %co% %nn-1% [vt w+as/ET ] [vn geciged/VN ] 2[S %nn-1% [nn Euthicia/NMN ] S]2 L]1 , ] [pp betwux/PP o+drum/JJD mannum/NND ] [pp to/PP +t+are/PDD m+aran/JJD byrigene/NND ] ... ) (AELIVE,I,210.5)

then came some widow, that was called Eutychia, between other persons to that famous grave ...

The sentence begins with the temporal adverb '+ta', followed by the finite verb 'com', followed by the subject 'sum wydewe', which is modified by the relative clause 'seo w+as geciged Euthicia'; followed by two preposition phrases 'betwux o+drum mannum' and 'to +t+are m+aran byrigene'. The annotation is described in detail in the manual accompanying the corpus.

Each text in the Brooklyn Corpus is supplied in four different formats, each format as a separate file with the same name and a different extension. The four different formats are suitable for use with different search tools.