Treebank 2a Guidelines

Ann Taylor
25 Oct 2006

Introduction

Treebank 2a is the annotation style used in the English Treebank being created as part of the OntoNotes Project (DARPA GALE). It is based on the original Penn Treebank II Style (Bies, et.al.). In addition to the changes described in this document, it includes the following revisions instituted by the LDC annotators, as described in Warner, et.al.: Bies, Ann, Mark Ferguson, Karen Katz, and Robert MacIntyre. 1995. Bracketing Guidelines for Treebank II Style. University of Pennsylvania.
Warner, Colin, Ann Bies, Christine Brisson, and Justin Mott. 2004. Addendum to the Penn Treebank II Style Bracketing Guidelines: BioMedical Treebank Annotation.

Raising and control

Treebank 2a makes use of a new empty category *PRO*. It replaces a subset of the cases of empty subjects labelled * in Treebank II. The * empty category is now used only for cases of syntactic movement, i.e. passives and subject raising. In addition, there has been a major reorganization of verbal complementation and control.

Raising versus subject control

In Treebank II, raising and subject control are not distinguished, both being represented as follows:

Treebank II style

( (S (NP-SBJ-1 (DT This) (NN trial))
     (VP (VBZ is)
         (VP (VBN expected)
             (S (NP-SBJ (-NONE- *-1))
                (VP (TO to)
                    (VP (VB last)
                        (NP-TMP (CD five) (NNS weeks)))))))
     (. .))
  ) 

( (S (NP-SBJ-1 (PRP We))
     (VP (VBP 've)
	 (VP (VBD tried)
	     (S (NP-SBJ (-NONE- *-1))
		(VP (TO to)
		    (VP (VB train)
			(NP (DT the) (NNS youngsters))))))))

  )


In TB2a these cases are distinguished by type of empty category: * indicates raising and *PRO* subject control. Both cases are indexed.

Treebank 2a style: raising

( (S (NP-SBJ-1 (DT This) (NN trial))
     (VP (VBZ is)
         (VP (VBN expected)
             (S (NP-SBJ (-NONE- *-1))
                (VP (TO to)
                    (VP (VB last)
                        (NP-TMP (CD five) (NNS weeks)))))))
     (. .))
  ) 

Treebank 2a style: subject control
( (S (NP-SBJ-1 (PRP We))
     (VP (VBP 've)
	 (VP (VBD tried)
	     (S (NP-SBJ (-NONE- *PRO*-1))
		(VP (TO to)
		    (VP (VB train)
			(NP (DT the) (NNS youngsters))))))))

  )

Raising predicates: appear, begin, continue, end up, fail, figure, happen, keep, need, ought, prove, quit, remain, say, seem, start, stop, tend, wind up, apt, certain, due, likely, sure, unlikely, be about to, be bound to, be going to, be set to, be supposed to, have to, have got to, turn out to, used to, plus the passives of ECM verbs.

Subject control verbs: admit, afford, agree, aim, apply, arrange, ask, attempt, avoid, be willing, bother, come, care, choose, claim, clamor, concede, conspire, decide, decline, delay, deny, deserve, determine, disclaim, discuss, enjoy, elect, favor, figure, flock, force, forget, get, go to show, hate, hesitate, hope, intend, jump, know, learn, like, look, love, manage, mean, mind, miss, move, negotiate, offer, opt, plan, pledge, plot, pose, ponder, prefer, prepare, press, proceed, profess, promise, propose, push, quite, race, recall, refuse, report, resolve, risk, rule out, rush, scramble, seek, serve, set out, sign, sound, stand, strive, struggle, suffice, swar, threaten, try, undertake, vote, vow, wait, want, wish

Tests for raising predicates

  1. can take expletive there subject
    there doesn't seem to be an answer to this question
    *there tried to be an answer to this question
    
    
  2. can take weather it subject
    it is likely to rain this week
    it began to rain this morning
    *it tried to rain this week
    
    
  3. can take idiom chunk subjects (and still be idiomatic)
    the cat seems to have got your tongue
    the shit started to hit the fan 
    
    

About to

About to is parsed as follows:
(S (NP-SBJ-1 (NNP Japan))
   (VP (VBZ is)
       (ADJP-PRD (JJ about)
		 (S (NP-SBJ (-NONE- *-1))
		    (VP (TO to)
			(VP (VB embark)
			    (PP-CLR (IN on)
				    (NP (DT a) (JJ major) (NN buying) (NN binge)))))))))

ECM versus object control

In Treebank II, only a small number of verbs which take sentential complementation were treated as ditransitive (see manual, page 246, for a list). By default, all other verbs were represented as monotransitive (i.e. ECM) as follows:

Treebank II style: monotransitive

( (S (NP-SBJ (NP (DT An) (NN accounting) (NN controversy))
             (PP-TMP (IN at)
                     (NP (NP (DT the) (NN end))
                         (PP (IN of)
                             (NP (JJ last) (NN year))))))
     (VP (VBD forced)
         (S (NP-SBJ (NNP Boston) (NNP Co.))
            (VP (TO to)
                (VP (VB admit)
                    (SBAR (-NONE- 0)
                          (S (NP-SBJ (PRP it))
                             (VP (VBD had)
                                 (VP (VBN overstated)
                                     (NP (JJ pretax) (NNS profits))
                                     (PP-MNR (IN by)
                                             (NP (QP (RB some) ($ $) (CD 44) (CD million))
                                                 (-NONE- *U*)))))))))))
     (. .)))

In Treebank 2a, we have tried to distinguish ECM from object control on more linguistic grounds. Object control is indicated in the same way as subject control: the empty subject is labelled *PRO* and coindexed to the controlling object.

Treebank 2a style: object control

( (S (NP-SBJ (NP (DT An) (NN accounting) (NN controversy))
             (PP-TMP (IN at)
                     (NP (NP (DT the) (NN end))
                         (PP (IN of)
                             (NP (JJ last) (NN year))))))
     (VP (VBD forced)
         (NP-1 (NNP Boston) (NNP Co.))
         (S (NP-SBJ (-NONE- *PRO*-1))
            (VP (TO to)
                (VP (VB admit)
                    (SBAR (-NONE- 0)
                          (S (NP-SBJ (PRP it))
                             (VP (VBD had)
                                 (VP (VBN overstated)
                                     (NP (JJ pretax) (NNS profits))
                                     (PP-MNR (IN by)
                                             (NP (QP (RB some) ($ $) (CD 44) (CD million))
                                                 (-NONE- *U*)))))))))))
     (. .)))

Ditransitive verbs: ask, admonish, authorize, bribe, catch, caution, choose, commision, compel, convince, designate, devise, direct, dispatch, empower, enable, encourage, entice, expand, force, hire, impell, induce, influence, inspire, invite, lead, leave, license, motivate, move, name, oblige, order, permit, press, prod, program, prompt, push, require, retain, schedule, select, sensitize, slate, spur, teach, tell, trust, urge, warn

ECM verbs: allow, assume, believe, cause, consider, declare, deem, estimate, expect, find, get, hold, imagine, intend, judge, know, make, mean, need, perceive, project, repute, rumor, report, say, see, show, suppose, think, want, wish

The tests for ECM verbs are basically the same as for raising. They can take infinitives with expletive there or weather it subjects, or an idiom chunk subject.

     Mary expected there to be a party
     *Mary persuaded there to be a party
     Mary believed it to be raining
     *Mary decided it to be raining
     Mary found the cat to be out of the bag
     *Mary encouraged the cat to be out of the bag


Small clauses and secondary predicates

The division between verbs taking a small clause and those taking an object and secondary predicate has been realigned in Treebank 2a. All 'label' verbs, (appoint, call, elect, name,, etc.) take a secondary predicate. Among other verbs, those that take a finite or non-finite (verbal) S argument (declare, proclaim, find, etc.) are treated as taking small clauses, while others (keep, leave, etc.) are treated as taking an object and a secondary predicate. Small clauses are labelled S, while secondary predicates are labelled S-CLR when object controlled, and S-CLR or S-ADV when subject controlled (see below).

Small clause:

( (S (NP-SBJ (PRP He))
     (ADVP (RB also))
     (VP (VP (VBZ considers)
             (S (NP-SBJ (DT the) (NN market))
                (ADJP-PRD (VBD overvalued))))
         (CC and)
         (VP (VBZ cites)
             (NP (NP (DT the) (NNS troubles))
                 (PP-LOC (IN in)
                         (NP (NN junk) (NNS bonds))))))
     (. .))
  ) 

( (S (NP-SBJ-1 (PRP You))
     (VP (VBP do)
         (RB n't)
         (VP (VB want)
             (S (NP-SBJ (-NONE- *PRO*-1))
                (VP (TO to)
                    (VP (VB get)
                        (S (NP-SBJ (PRP yourself))
                           (ADJP-PRD (RB too) (JJ upset)
                                     (PP (IN about)
                                         (NP (DT these) (NNS things))))))))))
     (. .))
  ) 

Secondary predicate:
( (S (NP-SBJ-1 (NP (NNP David) (NNP Tagg))
               (, ,)
               (RRC (ADVP-TMP (RB formerly))
		    (PP (IN in)
			(NP (NP (NN charge))
			    (PP (IN of)
				(NP (NN gambling) (NNS operations))))))
               (, ,))
     (VP (VBD was)
         (VP (VBN appointed)
             (NP-2 (-NONE- *-1))
             (S-CLR (NP-SBJ (-NONE- *PRO*-2))
                    (NP-PRD (NP (JJ chief) (NN executive))
                            (PP (IN for)
                                (NP (NN retailing) (CC and) (NN property)))))))
     (. .))
  ) 

( (S (NP-SBJ (NN uncertainty))
     (VP (VBZ drives)
         (NP-3 (NNS people))
         (S-CLR (NP-SBJ (-NONE- *PRO*-3))
                (ADJP-PRD (JJ wild)))))
  ) 

Verbs taking secondary predicates:
(1) Label verbs:
appoint, call, code-name, designate, dub, elect, entitle, headline, label, mark, name, nickname, quote, rank, rate, rename, subtitle, tag, term, title, vote
(2) Others: drive, keep, leave, render, turn,

Verbs taking small clauses: believe, consider, declare, deem, fancy, fear, find, get, have, hold, judge, make, presume, proclaim, pronounce, prove, regard, render, report, rule, rumor, see, think, want, wish

S-ADV vs. S-CLR with secondary predicates

All secondary predicates coindexed to the object are labelled S-CLR. Secondary predicates coindexed to the subject are labelled S-ADV when the predicate can be paraphrased with while/being, or be preposed. Most subject-controlled secondary predicates are of the S-ADV type.
    he walked into the room naked
    he walked into the room while/being naked
    naked, he walked into the room

(S (NP-SBJ-1 he)
   (VP walked
       (PP-DIR into the room)
       (S-ADV (NP-SBJ *PRO-1)
	      (ADJP-PRD naked))))

    he fell silent
    *he fell while/being silent
    *silent, he fell

(S (NP-SBJ-1 he)
   (VP fell
       (S-CLR (NP-SBJ *PRO-1)
	      (ADJP-PRD silent))))


Parentheticals

In TreeBank II, the use of PRN was relatively unconstrained. In addition to sentential interpolations, it was used to enclose anything in parentheses or set off by dashes.
She finds the response of Marina residents (PRN -- primarily yuppies
and elderly people --) to the devastation of their homes ``
incredible.

It also was also sometimes used to bracket such 'elaborations' (commonly introduced by such adverbs as namely, especially, particularly, notably, primarily, mostly,, etc.) when they weren't indicated by parens or dashes
But the troubles of SCI TV are a classic tale of the leveraged buy-out
excesses of the 1980s, (PRN especially the asset-stripping game).

and various other things.

In Treebank 2a, the label PRN is strictly restricted to sentential interpolations (including fragmentary comments). The coindexing system of Treebank II where the S was coindexed with a *T* empty category has been replaced by an unindexed *?* empty category (following current policy in new TBs).

( (S (ADVP (IN Besides))
     (PRN (, ,)
	  (S (NP-SBJ (NNP Eggers))
	     (VP (VBZ says)
		 (SBAR (-NONE- 0)
		       (S (-NONE- *?*)))))
	  (, ,))
     (NP-SBJ (NN grain) (NNS elevators))
     (VP (VBP are)
	 (PP-PRD (IN worth)
		 (S-NOM (NP-SBJ (-NONE- *PRO*))
			(VP (VBG preserving)
			    (PP-PRP (IN for)
				    (NP (JJ aesthetic) (NNS reasons))))))))
  )


Copular predicates

In Treebank II, the verbs appear, prove, seem, turn out are treated as taking simple copular predicates (along with the other verbs listed on p. 113 of the manual); in Treebank 2a these verbs take a small clause.

Treebank II style: simple copular predicate
( (S (NP-SBJ-1 (QP (RB Almost) (DT no))
               (NN one))
     (VP (VBZ seems)
	 (ADJP-PRD (JJ optimistic)
		   (PP (IN about)
		       (NP (NP (DT the) (NN future))
			   (PP-LOC (IN of)
				   (NP (DT the) (NNP Mideast)))))))
     (. .)))

Treebank 2a style: small clause
( (S (NP-SBJ-1 (QP (RB Almost) (DT no))
               (NN one))
     (VP (VBZ seems)
         (S (NP-SBJ (-NONE- *-1))
            (ADJP-PRD (JJ optimistic)
                      (PP (IN about)
                          (NP (NP (DT the) (NN future))
                              (PP-LOC (IN of)
                                      (NP (DT the) (NNP Mideast))))))))
     (. .)))