Contents of this chapter:

General considerations
Search functions

CCommands
Column
Dominates
DomsWords
DomsWords<
DomsWords>
Exists
HasSister
iDominates
iDomsFirst
iDomsLast
iDomsMod
iDomsNumber
iDomsOnly
iDomsTotal
iDomsTotal<
iDomsTotal>
InID
iPrecedes
IsRoot
Precedes
SameIndex

General considerations

We commonly refer to the first argument to a search function as "x", and the second argument as "y".

To save typing and to improve readability, CorpusSearch allows shorthands and lower-case/upper-case variations for the names of search functions. Acceptable variants are listed below with each function.

When a function has an integer argument, there is always a space between the function and argument. This syntax is a change from earlier versions of CorpusSearch.

Search functions

CCommands (variants: cCommands, ccommands)

A node x CCommands a node y if and only if:
  1. x does not dominate y AND
  2. the first branching node dominating x does dominate y.
Thus, the following query:
query: (NP-SBJ* idoms PRO$) AND (PRO$ ccommands NP*)

finds examples like:
(NP-SBJ (PRO$ his)
        (ADVR+Q ouermoch)
        (N fearinge)
        (PP (P of)
	    (NP (PRO you))))
in which a possessive pronoun CCommands a noun phrase, here the object of a prepositional complement to the head noun.

Column (variants: column, Col, col)

"Column" is used to search columns of the CODING node, or any other leaf whose text is written in columns separated by ":".

If, for instance, you want to find sentences whose CODING node contains an "m" or "n" in the 7th column, use this query:

query:  (CODING column 7 m|n)
If you want to find sentences whose CODING node does not contain a "p" or "q" in the 4th column, use this query:
query:  (CODING column 4 !p|q)

Dominates (variants: dominates, Doms, doms)

dominates means "dominates to any generation." That is, y is contained in the sub-tree dominated by x. Dominates will accept text as y, but text as x will always return an empty set (text never dominates a subtree.) Notice that the following query uses the escape character, "\", to search for *arb*:

(IP-INF dominates \*arb*)

returns this sentence:

/~*
And soo by the counceil of Merlyn the kyng lete calle his barons to counceil,
(CMMALORY,14.419)
*~/

/*
    18 IP-INF: 19 NP-SBJ *arb*
*/

(
      (18 IP-INF (19 NP-SBJ *arb*)
                 (20 VB calle)
                 (21 NP-OB1 (22 PRO$ his) (23 NS barons))
                 (24 PP (25 P to)
                        (26 NP (27 N counceil))))
      (ID CMMALORY,14.419))

DomsWords (variants: domsWords, domswords)

domsWords counts the number of words dominated by the search-function argument. So "domsWords 4" means "dominates 4 words", domsWords 2 mean "dominates 2 words", and so on. A word in this case is defined as a leaf node that is not on the word_ignore_list. Here's the default word_ignore_list:

RMV:*|COMMENT|CODE|ID|LB|'|\"|,|E_S|0|\**

Thus, traces, 0 complementizers, punctuation, and comments are not counted as words.

So this query:

(NP-OB* domsWords 3)

will return this structure (ignoring the trace *ICH*-1):

/~*
and by kynge Ban and Bors his counceile they lette brenne and destroy all the
contrey before them there they sholde ryde.
(CMMALORY,20.613)
*~/

/*
    24 NP-OB1: 27 N contrey
*/

(
      (24 NP-OB1 (25 Q all)
                 (26 D the)
                 (27 N contrey)
                 (28 CP-REL *ICH*-1))
      (ID CMMALORY,20.613))

(only the NP-OB1 node was printed in this output because the query file included the line "node: NP*").

DomsWords< (variants: domsWords<, domswords<)

domsWords< is just like domsWords except that it returns structures that dominate strictly less than the given number of words. For instance, this query:

(NP-OB* domsWords< 3)

will return this structure:

/~*
for it was I myself that cam in the lykenesse.
(CMMALORY,5.131)
*~/

/*
    6 NP-OB1: 9 PRO$+N myself
*/

(
      (6 NP-OB1 (7 PRO I)
                (8 NP-PRN (9 PRO$+N myself)))
      (ID CMMALORY,5.131))

(only the NP-OB1 node was printed in this output because the query file included the line "node: NP*").

DomsWords> (variants: domsWords>, domswords>)

domsWords> is just like domsWords except that it returns structures that dominate strictly more than the given number of words. For instance, this query:

(NP-OB* domsWords> 3)

will return this structure:

/~*
for she was called a fair lady and a passynge wyse,
(CMMALORY,2.9)
*~/

/*
    9 NP-OB1: 20 ADJ wyse
*/

(
      (9 NP-OB1
                (10 NP (11 D a) (12 ADJ fair) (13 N lady))
                (14 CONJP (15 CONJ and)
                          (16 NP (17 D a)
                                 (18 ADJP (19 ADV passynge) (20 ADJ wyse)))))
      (ID CMMALORY,2.9))

(only the NP-OB1 node was printed in this output because the query file included the line "node: NP*").

Exists (variants: exists)

exists searches for label or text anywhere in the sentence. For instance, this query:

(MD0 exists)

will find this sentence:

/~*
but I fere me that I shal not conne wel goo thyder /
(CMREYNAR,14.261)
*~/

/*
    10 IP-SUB: 15 MD0 conne
*/

(
      (10 IP-SUB
                 (11 NP-SBJ (12 PRO I))
                 (13 MD shal)
                 (14 NEG not)
                 (15 MD0 conne)
                 (16 ADVP (17 ADV wel))
                 (18 VB goo)
                 (19 ADVP-DIR (20 ADV thyder)))
      (ID CMREYNAR,14.261))

A common mistake is to use "exists" unneccessarily, as in this example:

(MD exists) AND (MD iPrecedes VB)

If a sentence contains the structure (MD iPrecedes VB), MD necessarily exists in the sentence. So this query would get the same result:

(MD iPrecedes VB)

HasSister (variants: hasSister, hassister)

x hasSister y if x and y have the same mother. It doesn't matter whether x precedes y or y precedes x. So this query:
node: IP*
query: (NP-SBJ hasSister BE*)
finds both of these sentences:

/~*
indeede I must be gone:
(DELONEY,69.13)
*~/
/*
1 IP-MAT-SPE:  5 NP-SBJ, 10 BE
*/


( (IP-MAT-SPE (PP (P+N indeede))
              (NP-SBJ (PRO I))
              (MD must)
              (BE be)
              (VBN gone)
              (. :))
  (ID DELONEY,69.13))

/~*
I pray you is it true?
(DELONEY,70.47)
*~/
/*
13 IP-SUB-SPE:  16 NP-SBJ, 14 BEP
*/


( (CP-QUE-SPE (IP-MAT-PRN-SPE (NP-SBJ (PRO I))
                              (CODE {TEMP:prn_ok})
                              (VBP pray)
                              (NP-ACC (PRO you)))
              (IP-SUB-SPE (BEP is)
                          (NP-SBJ (PRO it))
                          (ADJP (ADJ true)))
              (. ?))
  (ID DELONEY,70.47))

iDominates (variants: idominates, iDoms, idoms)

iDominates means "immediately dominates". That is, x dominates y if y is a child (exactly one generation apart) of x. So this query:

((NP* iDominates FP) AND (FP iDominates ane))

finds this sentence:

/~*
Sythen he ledes +tam by +tar ane,
(CMROLLEP,118.978)
*~/

/*
    1 IP-MAT: 11 NP, 13 FP ane
*/

(0
   (1 IP-MAT
             (2 ADVP-TMP (3 ADV Sythen))
             (4 NP-SBJ (5 PRO he))
             (6 VBP ledes)
             (7 NP-OB1 (8 PRO +tam))
             (9 PP (10 P by)
                   (11 NP (12 PRO$ +tar) (13 FP ane)))
             (14 E_S ,))
      (ID CMROLLEP,118.978))

/*

Notice that "iDominates" describes the relationship between a label and its associated text (e.g., "FP" and "ane").

iDomsFirst (variants: idomsfirst)

"iDomsFirst" means "immediately dominates as a first child."

For instance, this query:

node: IP*
query: (NP* iDomsFirst PRO$)

results in this output:

/~*
My Lady yor mother, I thanke God, is very well and cheerly,
(KNYVETT-1630,86.12)
*~/
/*
1 IP-MAT:  2 NP-SBJ, 3 PRO$
1 IP-MAT:  7 NP-PRN, 8 PRO$
*/

( (IP-MAT (NP-SBJ (PRO$ My)
                  (N Lady)
                  (NP-PRN (PRO$ yor) (N mother)))
          (, ,)
          (IP-MAT-PRN (NP-SBJ (PRO I))
                      (VBP thanke)
                      (NP-ACC (NPR God)))
          (, ,)
          (BEP is)
          (ADJP (ADJP (ADV very) (ADJ well))
                (CONJP (CONJ and)
                       (ADJX (ADJ cheerly))))
          (. ,))
  (ID KNYVETT-1630,86.12))

iDomsLast (variants: idomslast)

"iDomsLast" means "immediately dominates as a last child."

So this query:

node: IP*
query: (IP* iDomsLast BEN)

results in this output:

/~*
but keepes her chamber because of the Bitter weather that hath been.
(KNYVETT-1630,86.13)
*~/
/*
31 IP-SUB:  31 IP-SUB, 36 BEN
*/

( (IP-MAT (CONJ but)
          (NP-SBJ *con*)
          (VBP keepes)
          (NP-ACC (PRO$ her) (N chamber))
          (PP (P+N because)
              (PP (P of)
                  (NP (D the)
                      (ADJ Bitter)
                      (N weather)
                      (CP-REL (WNP-1 0)
                              (C that)
                              (IP-SUB (NP-SBJ *T*-1)
                                      (HVP hath)
                                      (BEN been))))))
				      (. .))
  (ID KNYVETT-1630,86.13))

iDomsMod (variants: idomsmod)

x immediately dominates (mod z) y if x dominates y, and the only nodes intervening on the path from x to y (if any) are members of z. Please note that if no intervening nodes at all occur on the path from x to y, the query function is true. To search for pronominal subjects mod conjunction, you can use the following query:
node: IP*
query: (NP-SBJ iDomsMod NP*|CONJ* PRO)
finds this sentence:

/~*
So by the entrete at the last the kyng and she met togyder.
(CMMALORY,4.104)
*~/
/*
1 IP-MAT:  21 NP-SBJ, 31 PRO, 27 CONJP
*/


(0  (1 IP-MAT (2 ADVP (3 ADV So))
              (5 PP (6 P by)
                    (8 NP (9 D the) (11 N entrete)))
              (13 PP (14 P at)
                     (16 NP (17 D the) (19 ADJ last)))
              (21 NP-SBJ (22 NP (23 D the) (25 N kyng))
                         (27 CONJP (28 CONJ and)
                                   (30 NP (31 PRO she))))
              (33 VBD met)
              (35 ADVP (36 ADV togyder))
              (38 E_S .))
    (40 ID CMMALORY,4.104))

iDomsNumber (variants: idomsnumber, iDomsNum, idomsnum)

iDomsNumber means "immediately dominates as the #th child". That is, x immediately dominates y as the #th child if x immediately dominates y and y is the #th child of x. Notice that iDomsNumber 1 is a superset of iDomsOnly. This query:

(CP-DEG iDomsNumber 1 C)

produces this output:

/~*
And Merlion was so disgysed that kynge Arthure knewe hym nat,
(CMMALORY,30.939)
*~/

/*
    1 IP-MAT: 9 CP-DEG, 10 C that
*/

(0
   (1 IP-MAT (2 CONJ And)
             (3 NP-SBJ (4 NPR Merlion))
             (5 BED was)
             (6 ADJP (7 ADVR so)
                     (8 VAN disgysed)
                     (9 CP-DEG (10 C that)
                               (11 IP-SUB
                                          (12 NP-SBJ (13 NPR kynge) (14 NPR Arthure))
                                          (15 VBD knewe)
                                          (16 NP-OB1 (17 PRO hym))
                                          (18 NEG nat))))
             (19 E_S ,))
      (ID CMMALORY,30.939))

iDomsOnly (variants: idomsonly)

iDomsOnly means "immediately dominates as an only child." That is, x immediately dominates y as an only child if x immediately dominates y and y is the only legitimate child of x. So this query:

(ADJP iDomsOnly Q*)

results in this output:

 
/~*
But after my lytyll wytt it semeth me, sauynge here reuerence, +tat is more.
(CMMANDEV,123.2992)
*~/

/*
    23 IP-SUB: 27 ADJP, 28 QR more
*/

(
      (23 IP-SUB
                 (24 NP-SBJ (25 D +tat))
                 (26 BEP is)
                 (27 ADJP (28 QR more)))
      (ID CMMANDEV,123.2992))

iDomsTotal (variants: idomstotal)

iDomsTotal counts the number of nodes immediately dominated by the search- function argument. So this query:

(NP-OB* iDomsTotal 3)

results in this output:

/~*
And +tere it lykede him to suffre many repreuynges and scornes for vs
(CMMANDEV,1.4)
*~/

/*
    10 IP-INF-1: 13 NP-OB1, 16 CONJP
*/

(
      (10 IP-INF-1 (11 TO to)
                   (12 VB suffre)
                   (13 NP-OB1 (14 Q many)
                              (15 NS repreuynges)
                              (16 CONJP (17 CONJ and)
                                        (18 NX (19 NS scornes))))
                   (20 PP (21 P for)
                          (22 NP (23 PRO vs))))
      (ID CMMANDEV,1.4))

Here, the 3 nodes immediately dominated by NP-OB1 are labelled Q, NS, and CONJP.

iDomsTotal< (variants: idomstotal<)

iDomsTotal< is like iDomsTotal except that it returns structures that immediately dominate strictly less than the given number of nodes. So this query:

(NP-OB* iDomsTotal< 3)

results in this output:

/~*
& take of euereche iliche myche
(CMHORSES,125.397)
*~/

/*
    1 IP-IMP: 8 NP-OB1, 9 QP
*/

(0
   (1 IP-IMP (2 CONJ &)
             (3 VBI take)
             (4 PP (5 P of)
                   (6 NP (7 Q euereche)))
             (8 NP-OB1
                       (9 QP (10 ADV iliche) (11 Q myche))))
      (ID CMHORSES,125.397))

iDomsTotal> (variants: idomstotal>)

iDomsTotal> is like iDomsTotal except that it returns structures that immediately dominate strictly more than the given number of nodes. So this query:

(NP-OB* iDomsTotal> 3)

will produce this output:

/~*
& aftur tak an hot yre +tat is smal bi-fore
(CMHORSES,95.119)
*~/

/*
    1 IP-IMP: 6 NP-OB1, 10 CP-REL
*/

(0
   (1 IP-IMP (2 CONJ &)
             (3 ADVP-TMP (4 ADV aftur))
             (5 VBI tak)
             (6 NP-OB1 (7 D an)
                       (8 ADJ hot)
                       (9 N yre)
                       (10 CP-REL (11 WNP-1 0)
                                  (12 C +tat)
                                  (13 IP-SUB (14 NP-SBJ *T*-1)
                                             (15 BEP is)
                                             (16 ADJP (17 ADJ smal))
                                             (18 ADVP-LOC (19 ADV bi-fore))))))
      (ID CMHORSES,95.119))

InID (variants: inID)

"inID" searches the ID node. Because the ID node is outside of the parsed sentence, it is not encountered by the other search functions. For instance, (ID iDominates *) will turn up empty.

Here's a typical ID node from the Malory corpus file:

(ID CMMALORY,3.41)

To isolate Malory sentences from an output file, you could use this query:

query:  (*MALORY* inID)

iPrecedes (variants: iprecedes, iPres, ipres)

The algorithm for "x iPrecedes y" runs as follows:

1.) Find x.

2.) If x has an immediately following sister, then that sister and all its leftmost descendants (that is, the first child of the sister, the first child of the first child, and on as far as the tree goes) are candidates for y.

3.) If x has no immediately following sister, recurse from 2.) with the mother of x in place of x.

Notice that this algorithm ensures that every word in the original text immediately precedes the word after it. This was not true when "iPrecedes" meant "immediately sister precedes".

For instance, this query:

query: ([1]as iPrecedes sone) AND (sone iPrecedes [2]as)

produces this output:

/~*
and as sone as he myght he toke his horse
(CMMALORY,206.3401)
*~/
/*
1 IP-MAT:  6 as, 8 sone, 11 as
*/


( (IP-MAT (CONJ and)
          (ADVP-TMP (ADVR as)
                    (ADV sone)
                    (PP (P as)
                        (CP-CMP (WADVP-1 0)
                                (C 0)
                                (IP-SUB (ADVP-TMP *T*-1)
                                        (NP-SBJ (PRO he))
                                        (MD myght)
                                        (VB *)))))
          (NP-SBJ (PRO he))
          (VBD toke)
          (NP-OB1 (PRO$ his) (N horse)))
  (ID CMMALORY,206.3401))

IsRoot (variants: isRoot, isroot)

isRoot searches for the argument label at the root of the tree. For instance, this query:

query: (CP* isRoot)

produces this output:

/~*
why thou whoreson when wilt thou be maried?
(DELONEY,79.296)
*~/
/*
1 CP-QUE-SPE:  1 CP-QUE-SPE
*/


( (CP-QUE-SPE (INTJP (WADV why))
              (NP-VOC (PRO thou) (N$+N whoreson))
              (WADVP-1 (WADV when))
              (IP-SUB-SPE (ADVP *T*-1)
                          (MD wilt)
                          (NP-SBJ (PRO thou))
                          (BE be)
                          (VAN maried))
              (. ?))
  (ID DELONEY,79.296))

Precedes (variants: precedes, Pres, pres)

"x precedes y" means "x comes before y in the tree but x does not dominate y". So this query:

(VB precedes NP-OB*)

produces this output:

/~*
thenne have ye cause to make myghty werre upon hym. '
(CMMALORY,2.25)
*~/

/*
    9 IP-INF-PRP: 11 VB make, 12 NP-OB1
*/

(
      (9 IP-INF-PRP (10 TO to)
                    (11 VB make)
                    (12 NP-OB1 (13 ADJ myghty)
                               (14 N werre)
                               (15 PP (16 P upon)
                                      (17 NP (18 PRO hym)))))
      (ID CMMALORY,2.25))

SameIndex (variants: sameIndex, sameindex)

x sameIndex y finds structures where x ends with the same index as y. This is useful in searching for antecedents with the same index as a trace. For instance, this query:

ignore_nodes: null
node: IP*
query: (NP* iDoms \*exp\*) AND (NP* sameIndex CP*)

finds this sentence:

/~*
hym thought there was com into hys londe gryffens and serpentes,
(CMMALORY,33.1031)
*~/
/*
1 IP-MAT:  2 NP-SBJ-1, 3 *exp*, 9 CP-THT-1
*/

( (IP-MAT (NP-SBJ-1 *exp*)
          (NP-OB2 (PRO hym))
          (VBD thought)
          (CP-THT-1 (C 0)
                    (IP-SUB (NP-SBJ-2 (EX there))
                            (BED was)
                            (VBN com)
                            (PP (P into)
                                (NP (PRO$ hys) (N londe)))
                            (NP-2 (NS gryffens) (CONJ and) (NS serpentes))))
          (E_S ,))
  (ID CMMALORY,33.1031))