Contents of this chapter:

revision feature
Don't repeat flags.
label changes
replace_label
append_label
prepend_label
pre_crop_label
post_crop_label
structural changes
add_leaf_before
add_leaf_after
move_up_node
move_up_nodes
add_internal_node
delete_leaf
delete_node
delete_subtree
examples

revision feature

CorpusSearch 2 has a corpus-revision feature, which allows the user to make automatic changes to a corpus. This is useful, for instance, in correcting parser errors, or revising a corpus to fit new annotation guidelines.

Revisions are linked to a standard CS query, which is decorated with curly-bracket tags indicating where revisions should take place. The curly brackets contain an index which correlates an argument in the query to a revision instruction. I'll call the curly-bracket construction a "flag". This is the general idea:

query: ({x}A function B) AND (C function {y}D)

revise{x}: info
revise{y}: info

Also see the examples.

don't repeat flags

Suppose you have a query where the same node is mentioned several times. You may be tempted to flag the node every time it appears in the query, as below:

WRONG!
query: (NP* iDoms {1}[1]Q)
       AND (NP* iDoms {2}[2]Q)
       AND ({1}[1]Q iPrecedes {2}[2]Q)
add_internal_node{1, 2}: QP

The problem with this is that CorpusSearch only needs to have the arguments flagged once, and repeating the flags just increases the possibility of error (for instance, the same flag might wind up referring to two different nodes). For this reason, CorpusSearch ignores repeated flags, and issues a warning when they are encountered. The above query produces these WARNING messages:

WARNING!  Subsequent flag {1} has been ignored.

WARNING!  Subsequent flag {2} has been ignored.

This version of the query is preferred:

query: (NP* iDoms {1}[1]Q)
       AND (NP* iDoms {2}[2]Q)
       AND ([1]Q iPrecedes [2]Q)
add_internal_node{1, 2}: QP

label changes

The simplest way to change a tree is to change labels, leaving the structure intact. CS has the following label-changing revision functions:

replace_label
replace_label{x}: new_label

append_label
append_label{x}: label_to_append

prepend_label
prepend_label{x}: label_to_prepend

post_crop_label
post_crop_label{x}: label_to_crop

pre_crop_label
pre_crop_label{x}: label_to_crop

replace_label

This query:
node: IP*
query:  ({1}NP-ACC iDoms N*)

replace_label{1}: BULLWINKLE
applied to this input:
( (IP-MAT (NP-SBJ (PRO You))
          (MD must)
          (NEG not)
          (VB exspecte)
          (NP-ACC (Q no) (ADJ greate) (NS matters))
          (NP-TMP (D this) (N time))
          (. ,)) (ID KNYVETT-1630,87.25))
produces this output:
( (IP-MAT (NP-SBJ (PRO You))
          (MD must)
          (NEG not)
          (VB exspecte)
          (BULLWINKLE (Q no) (ADJ greate) (NS matters))
          (NP-TMP (D this) (N time))
          (. ,))
  (ID KNYVETT-1630,87.25))

append_label

This appends the given label to the flagged argument. This query:
node: $ROOT

query: ({1}WPRO iDoms what|What) AND (WPRO iPrecedes IP*)

append_label{1}: -THAT
applied to this input:
( (IP-MAT (CONJ but)
          (CP-QUE (WNP-1 (WPRO what))
                  (IP-SUB (NP-TMP *T*-1)
                          (NP-SBJ (PRO I))
                          (MD shall)
                          (VB returne)
                          (NP-DIR (N home))))
          (NP-SBJ (PRO I))
          (BEP am)
          (ADJP (NP-MSR (D a) (Q little))
                (ADJ doubtfull))
          (. .)) (ID KNYVETT-1630,94.268))
produces this output:
/~*
but what I shall returne home I am a little doubtfull.
(KNYVETT-1630,94.268)
*~/
/*
1 IP-MAT:  6 WPRO, 7 what, 8 IP-SUB
*/


( (IP-MAT (CONJ but)
          (CP-QUE (WNP-1 (WPRO-THAT what))
                  (IP-SUB (NP-TMP *T*-1)
                          (NP-SBJ (PRO I))
                          (MD shall)
                          (VB returne)
                          (NP-DIR (N home))))
          (NP-SBJ (PRO I))
          (BEP am)
          (ADJP (NP-MSR (D a) (Q little))
                (ADJ doubtfull))
          (. .))
  (ID KNYVETT-1630,94.268))

prepend_label

This prepends the given label to the flagged argument. This query:
node: $ROOT
ignore_nodes: null
query: ({1}[1], iDoms [2],) AND ([1], iPres *-PRN)
       AND (*-PRN iPres [3],) AND ({2}[3], iDoms [4],)

prepend_label{1}: PRN-
prepend_label{2}: PRN-
applied to this input:
( (IP-MAT (CONJ &)
          (NP-SBJ (PRO$ my) (NS horsses))
          (, ,)
          (IP-MAT-PRN (NP-SBJ (PRO I))
                      (VBP thinke))
          (, ,)
          (MD $wil)
          (BE $be)
          (CODE {TEXT:wilbe})
          (VBN gone)
          (PP (P to)
              (NP (N morrowe)))
          (. ,)) (ID KNYVETT-1630,93.228))
produces this output:
/~*
& my horsses, I thinke, $wil $be gone to morrowe,
(KNYVETT-1630,93.228)
*~/
/*
1 IP-MAT:  9 ,, 10 ,, 11 IP-MAT-PRN, 17 ,, 18 ,
*/


( (IP-MAT (CONJ &)
          (NP-SBJ (PRO$ my) (NS horsses))
          (PRN-, ,)
          (IP-MAT-PRN (NP-SBJ (PRO I))
                      (VBP thinke))
          (PRN-, ,)
          (MD $wil)
          (BE $be)
          (CODE {TEXT:wilbe})
          (VBN gone)
          (PP (P to)
              (NP (N morrowe)))
          (. ,))
  (ID KNYVETT-1630,93.228))

pre_crop_label

This crops the label ending at the given character. This query:
node: $ROOT

query: (ADVP* iDoms {1}ADV+*)

pre_crop_label{1}: +
applied to this input:
( (IP-MAT (CONJ &)
          (NP-SBJ (Q many))
          (VBD lost)
          (NP-ACC (PRO$ ther) (NS lifes))
          (PP (PP (P aboute)
                  (NP (D the) (NS Teames)))
              (CONJP (CONJ &)
                     (ADVP-LOC (ADV+WADV elsewher))))
          (. .)) (ID KNYVETT-1630,87.21))
results in this output:
/~*
& many lost ther lifes aboute the Teames & elsewher.
(KNYVETT-1630,87.21)
*~/
/*
1 IP-MAT:  26 ADVP-LOC, 27 ADV+WADV
*/

( (IP-MAT (CONJ &)
          (NP-SBJ (Q many))
          (VBD lost)
          (NP-ACC (PRO$ ther) (NS lifes))
          (PP (PP (P aboute)
                  (NP (D the) (NS Teames)))
              (CONJP (CONJ &)
                     (ADVP-LOC (WADV elsewher))))
          (. .))
  (ID KNYVETT-1630,87.21))

post_crop_label

This crops the label beginning at the indicated character.

This query:

node: IP*
query:  ({1}NP-ACC iDoms N*)

post_crop_label{1}: -
append_label{1}: -OBJ
applied to this input:
( (IP-MAT (NP-SBJ (PRO You))
          (MD must)
          (NEG not)
          (VB exspecte)
          (NP-ACC (Q no) (ADJ greate) (NS matters))
          (NP-TMP (D this) (N time))
          (. ,)) (ID KNYVETT-1630,87.25))
produces this output:
( (IP-MAT (NP-SBJ (PRO You))
          (MD must)
          (NEG not)
          (VB exspecte)
          (NP-OBJ (Q no) (ADJ greate) (NS matters))
          (NP-TMP (D this) (N time))
          (. ,))
  (ID KNYVETT-1630,87.25))

structural changes

CS has the following structure-changing revision functions. Use them with care, and always keep a backup copy of your original file.

add_leaf_before
add_leaf_before{x}: (pos text)

add_leaf_after
add_leaf_after{x}: (pos text)

move_up_node
move_up_node{x}:

move_up_nodes
move_up_nodes{x, y}:

add_internal_node
add_internal_node{x, y}: new_label

delete_leaf
delete_leaf{x}:

delete_node
delete_node{x}:

delete_subtree
delete_subtree{x}:

It is possible for the described change to result in an illegal tree, that is, a tree with crossing branches, or a tree containing an internal node with no leaf descendants (a pollarded tree?) If this is the case, a warning is given and the tree is not changed.

add_leaf_before, add_leaf_after

This query:
node: IP*
query:  (PP iDoms {1}P)

add_leaf_before{1}: (X BULLWINKLE)
add_leaf_after{1}: (Q ROCKY)
applied to this input:
( (IP-MAT (PP (P Unto)
              (NP (D that)))
          (NP-SBJ (PRO they)
                  (QP (Q all)))
          (ADVP (ADV well))
                (VBD accordyd))
  (ID CMMALORY,5.110) )
produces this output:
/~*
BULLWINKLE Unto ROCKY that they all well accordyd
(CMMALORY,5.110)
*~/
/*
1 IP-MAT:  2 PP, 3 P
*/

( (IP-MAT (PP (X BULLWINKLE)
              (P Unto)
              (Q ROCKY)
              (NP (D that)))
          (NP-SBJ (PRO they)
                  (QP (Q all)))
          (ADVP (ADV well))
          (VBD accordyd))
  (ID CMMALORY,5.110))

move_up_node

This query:
node: IP*
query:  (NP iDoms {1}D)

move_up_node{1}:
applied to this input:
( (IP-MAT (ADVP-TMP (ADV Thenne))
          (PP (P in)
              (NP (Q all) (N haste)))
          (VBD came)
          (NP-SBJ (NPR Uther))
          (PP (P with)
              (NP (D a) (ADJ grete) (N hoost))))
   (ID CMMALORY,3.37))
produces this output:
( (IP-MAT (ADVP-TMP (ADV Thenne))
          (PP (P in)
              (NP (Q all) (N haste)))
          (VBD came)
          (NP-SBJ (NPR Uther))
          (PP (P with)
              (D a)
              (NP (ADJ grete) (N hoost))))
  (ID CMMALORY,3.37))
Notice that the direction of movement is constrained by word order. If the node to move is a middle or only child, a warning is given and the tree is not changed.

move_up_nodes

This query:
node: IP*
query:  ({1}Q iprecedes {2}ADJ)

move_up_nodes{1, 2}:
applied to this input:
( (IP-MAT (NP-SBJ (PRO You))
          (MD must)
          (NEG not)
          (VB exspecte)
          (NP-ACC (Q no) (ADJ greate) (NS matters))
          (NP-TMP (D this) (N time))
          (. ,)) (ID KNYVETT-1630,87.25))
produces this output:
( (IP-MAT (NP-SBJ (PRO You))
          (MD must)
          (NEG not)
          (VB exspecte)
          (Q no)
          (ADJ greate)
          (NP-ACC (NS matters))
          (NP-TMP (D this) (N time))
          (. ,))
  (ID KNYVETT-1630,87.25))
If the indicated move would leave an internal node with no leaf descendants, a warning is given and the tree is not changed.

add_internal_node

This query:
node: IP*
query:  ({1}MD HasSister {2}VB)

add_internal_node{1, 2}: MDVP
applied to this input:
( (IP-MAT-SPE (' ')
              (NP-VOC (N Sir))
              (, ,)
              (' ')
              (IP-MAT-PRN (VBD said)
                          (NP-SBJ (NPR Ulfius)))
              (, ,)
              (' ')
              (NP-SBJ (PRO he))
              (MD wille)
              (NEG not)
              (VB dwelle)
              (NP-MSR (ADJ long))
              (E_S .)
              (' '))
  (ID CMMALORY,3.66))
produces this output:
( (IP-MAT-SPE (' ')
              (NP-VOC (N Sir))
              (, ,)
              (' ')
              (IP-MAT-PRN (VBD said)
                          (NP-SBJ (NPR Ulfius)))
              (, ,)
              (' ')
              (NP-SBJ (PRO he))
              (MDVP (MD wille) (NEG not) (VB dwelle))
              (NP-MSR (ADJ long))
              (E_S .)
              (' '))
  (ID CMMALORY,3.66))
If the addition of the indicated node would produce crossing branches in the tree, a warning is given and the tree is not changed.

To add an internal node spanning just one existing node, list the same index twice. For instance, this query:

query: (IP* iDoms {1}BE*)

add_internal_node{1, 1}: VP
applied to this input:
( (IP-MAT-SPE (CONJ but)
              (ADVP (ADV truly))
              (NP-VOC (N gossip))
              (NP-SBJ (PRO you))
              (BEP are)
              (ADJP (ADJ welcome))
              (. ,))
  (ID DELONEY,69.9))
produces this output:
/~*
but truly gossip you are welcome,
(DELONEY,69.9)
*~/
/*
1 IP-MAT-SPE:  1 IP-MAT-SPE, 13 BEP
*/
( (IP-MAT-SPE (CONJ but)
              (ADVP (ADV truly))
              (NP-VOC (N gossip))
              (NP-SBJ (PRO you))
              (VP (BEP are))
              (ADJP (ADJ welcome))
              (. ,))
  (ID DELONEY,69.9))

delete_leaf

The argument specified in the query can match either a part of speech or text node: in either case, the entire part-of-speech/text pair is deleted.

If the indicated leaf is an only child, a warning is given and the tree is not changed.

This query:

node: IP*
ignore_nodes: null
query: (NP* iDoms {1}\**)

delete_leaf{1}:
applied to this input:
( (CP-QUE-SPE (INTJP (INTJ Tush))
              (NP-VOC (N woman))
              (, ,)
              (WNP-1 (WPRO what))
              (IP-SUB-SPE (NP-ACC *T*-1)
                          (VBP talke)
                          (NP-SBJ (PRO you))
                          (PP (P of)
                              (NP (D that))))
              (. ?)) (ID DELONEY,70.40))
produces this output:
/~*
Tush woman, what talke you of that?
(DELONEY,70.40)
*~/
/*
13 IP-SUB-SPE:  14 NP-ACC, 15 *T*-1
*/

( (CP-QUE-SPE (INTJP (INTJ Tush))
              (NP-VOC (N woman))
              (, ,)
              (WNP-1 (WPRO what))
              (IP-SUB-SPE (VBP talke)
                          (NP-SBJ (PRO you))
                          (PP (P of)
                              (NP (D that))))
              (. ?))
  (ID DELONEY,70.40))

delete_node

This is what syntacticians call "pruning". An internal node is deleted, but its descendants remain.

This query:

node: FRAG*

query: ({1}ADVP* iDoms ADV*)

delete_node{1}:
applied to this input:
( (FRAG-SPE (WNP (WPRO What))
            (ADVP-TMP (ADV neuer))
            (NP (D a) (ADJ great) (N belly))
            (ADVP (ADV yet))
            (. ?)) (ID DELONEY,69.5))
yields this output:
/~*
What neuer a great belly yet?
(DELONEY,69.5)
*~/
/*
1 FRAG-SPE:  5 ADVP-TMP, 6 ADV
1 FRAG-SPE:  15 ADVP, 16 ADV
*/

( (FRAG-SPE (WNP (WPRO What))
            (ADV neuer)
            (NP (D a) (ADJ great) (N belly))
            (ADV yet)
            (. ?))
  (ID DELONEY,69.5))

delete_subtree

This deletes the indicated node and all its descendants.

This query:

node: IP*
query:  ({1}CONJP* iDoms CONJ*)

delete{1}:
applied to this input:
( (IP-MAT (NP-SBJ (PRO I))
          (VBP hear)
          (CP-THT (C 0)
                  (IP-SUB (NP-SBJ (NP (N Lady) (N Banbery))
                                  (CONJP-1 (CONJ and)
                                           (NP (D y=e=)
					   (N Wardon)
                                           (PP (P of)
                                               (NP (NPR All) (NPRS
          Souls))))))
                          (BEP is)
                          (ADJP (ADJ dead))))
          (. .)) (ID ALHATTON,2,242.21))
results in this output:
( (IP-MAT (NP-SBJ (PRO I))
          (VBP hear)
          (CP-THT (C 0)
                  (IP-SUB (NP-SBJ (NP (N Lady) (N Banbery)))
                          (BEP is)
                          (ADJP (ADJ dead))))
          (. .))
  (ID ALHATTON,2,242.21))

examples

Here is an example from a Portuguese corpus. The contraction "dos" had been treated as one word, but the corpus-builders later decided to split it into two pieces, a preposition "$de", and a determiner "os":

Old:

          (PP (P+D-P dos)
              (NP (ADJ-P grandes)
                  (N-P homens)

New:

         (PP (P $de)
             (NP (D-P os)
                 (ADJ-P grandes)
                 (N-P homens)

To make the above change, use this query file:

node: IP*
//copy_corpus: t
query: (PP iDoms {1}P+D-P) AND
       (P+D-P iDoms {2}dos) AND
       (P+D-P iPres NP) AND
       (NP iDomsFirst {3}*)

replace_label{1}: P
replace_label{2}: $de
add_leaf_before{3}: (D-P os)

The query file as shown will produce a standard CS output file. To produce a file containing the input corpus file, with the changes described, un-comment "copy_corpus: t".