Contents of this chapter:

labels and words
string variations

words and labels

"Labels" are the all upper-case tags inserted by the linguists who prepared the corpus (e.g., "IP", "CONJ", "N".) "Words" are the mostly lower-case original words of text (e.g. "so", "hit".) Every node in the sentence-tree has a label, and the leaf nodes also have words. CorpusSearch can conduct searches on labels or words. In practice, the majority of searches look for labels only.

string variations

CorpusSearch uses case-sensitive character-by-character string matching to match search-function arguments to strings found in the input. Therefore, spelling and upper-case/lower-case variations must be described explicitly (usually with an argument list.) For instance, this query searches for a complementizer whose associated text is "that" or "That":

(C iDominates that|That)

and finds sentences such as this:

/~*
and he shalle do yow remedy, that youre herte shal be pleasyd. '
(CMMALORY,3.47)
*~/

/*
    12 CP-ADV: 13 C that
*/

(
      (12 CP-ADV (13 C that)
                 (14 IP-SUB
                            (15 NP-SBJ (16 PRO$ youre) (17 N herte))
                            (18 MD shal)
                            (19 BE be)
                            (20 VAN pleasyd)))
      (ID CMMALORY,3.47))