Converting from CS1.1 to CS2

The easiest way to use the sociolinguistic information (Metadata) included with PCEEC corpus is upgrade to CorpusSearch 2 (CS2). This guide is intended to help CS1.1 users switch to CS2. It contains basically the same information as the CorpusSearch 2 manual's What's new? section, but with more examples. The basic format and functionality of CS2 is similar to that of CS1.1, but there are a few crucial differences. The following is intended to alert you to important differences and help you avoid common pitfalls.

This guide only covers functions available in CS1.1. If you want to use the new functions available in CS2, see the CorpusSearch 2 User's Guide.

Format changes

Query format

In CS2 when writing multi-call queries (i.e. those with more than one search-function call) it is not necessary to append the additional search-function calls with right-branching parentheses. When using only the logical function AND, no extra parentheses are necessary.
CS 1.1
query: (((A function B)
AND (C function D))
AND (E function F))

CS 2
query: (A function B)
AND (C function D)
AND (E function F)

CS2 will accept queries written with CS1.1 style multi-function calls, but it is best not to use queries written for CS1.1 with CS2, as they will generally fail for other reasons.

Likewise if you want to use the new CS2 OR function, you can append single search-function calls without extra parentheses.

query: (A function B)
OR (C function D)
OR (E function F)

If you combine AND and OR, however, you need to use parentheses to tell CS how to group the individual calls.
query: (A function B)
OR ((C function D)
AND (E function F))
is different from
query: ((A function B)
OR (C function D))
AND (E function F)

Function format

The format of functions which include numbers, i.e., has changed. In CS1.1 the number is attached to the end of the search-function name, while in CS2 it must be separated.
CS 1.1
query: (CODING column1 a)
query: (NP domsWords3)
query: (NP domsWords>2)
query: (NP domsWords<5)
query: (IP* iDomsNumber1 CONJ)

CS 2
query: (CODING column 1 a)
query: (NP domsWords 3)
query: (NP domsWords> 2)
query: (NP domsWords< 5)
query: (IP* iDomsNumber 1 CONJ)

Search functions


The search-functions Precedes and iPrecedes have been redefined. In CS1.1 these functions applied only to sisters. In CS2 they apply to any nodes (i.e. in CS2 Precedes is the equivalent of CS1.1 AnyPrecedes).

To get the equivalent of CS1.1 Precedes and iPrecedes in CS2, you must use an extra search-function call hasSister in addition to Precedes or iPrecedes.

CS 1.1
query: (A iPrecedes B)

CS 2
query: (A iPrecedes B)
AND (A hasSister B)

hasSister is a useful function for finding sisters when you don't care about the order they are in. For instance, if you want to find NPs with ADJPs, but don't care whether the ADJP comes before or after the noun.
CS 2
node: NP*
query: (N hasSister ADJP)


The CS2 search-function iDomsMod allows the user to search into conjunction structures at the same time as searching non-conjoined structures. It replaces the CS1.1 conjunction switches (iDoms_conj_switch, etc.).

iDomsMod takes three search-terms, A, B, and C. This query finds structures A dominates C, and B may occur between A and C.

query: (A iDomsMod B C)

Thus, the following structures all match the query:
(A (B (...)
      (C (...))))

(A (B (...)
      (B (...)
	 (C (...)))))

(A (C (...)))

The last structure matches because B is allowed, but not required, to intervene between A and C. The following query searches for NP subjects that dominate a quantifier, allowing an NP, or a conjunction phrase (CONJP), or both, to intervene:
node: IP*
query: (NP-SBJ* iDomsMod NP|CONJP Q)

It finds structures in which nothing intervenes between NP-SBJ and Q.
( (50 IP-SUB (51 NP-SBJ (52 Q any) (54 N thing))
             (56 BEP be)
             (58 ADJP (59 ADJ distastefull))
             (61 PP (62 P unto)
                    (64 NP (65 PRO y=u=))))
  (108 ID ARUNDEL,38.2.28)) 

and structures in which the Q is in the first or second conjunct.
Q in first conjunct:

( (4 IP-ABS (5 NP-SBJ (6 NP (7 Q every) (9 N rod))
                      (11 CONJP (12 CONJ &)
                                (14 NP (15 NUM half) (17 N rod))))
            (19 VAN accompted))
  (213 ID BACON,I,72.53.1430)) 

Q in second conjunct:

( (22 IP-SUB-1 (23 NP-SBJ (24 NP (25 PRO him) (27 N self))
                          (29 CONJP (30 CONJ &)
                                    (32 NP (33 Q noe) (35 OTHER other))))
               (37 , ,)
               (39 IP-PPL RMV:at_that_tyme...)
               (74 , ,)
               (76 MD might)
               (78 VB kill)
               (80 NP-OB1 (81 NP (82 NPR Hatton))
                          (84 CONJP (85 CONJ &)
                                    (87 NP (88 Q noe) (90 OTHER other)))))
  (117 ID BACON,I,96.76.1956)) 

Lexicon function

CS2 includes a much-improved lexicon function that allows you to create lexicons by part-of-speech (only of adjectives (ADJ), for instance) or text (e.g., all words starting with a), or a combination of the two. See the CS2 User's Guide for details.


Coding works somewhat differently in CS2. In CS2 a coding query must specify a node, and begin with the line coding_query: .
node: IP*

1: { a: (search-function)
     b: (search-function)
     c: ELSE

2: { d: (search-function)
     e: (search-function)
     f: ELSE

When searching CODING strings, the number of the column must be separated (see Function format).
query: (CODING column 1 a)

For more detail on coding, see the CS2 User's Guide.