Table 15

The features used in the baseline argument classification model


Predicate – The predicate lemma

Path – The syntactic path through the parsing tree from the constituent being classified to the predicate

Constituent type

Position – Whether the phrase is located before or after the predicate

Voice – passive if the predicate has a POS tag VBN, and its chunk is not a VP, or it is preceded by a form of "to be" or "to get" within its chunk; otherwise, it is active

Head word – Calculated using the head word table described by Collins (1999)

Head POS – The POS of the Head Word

Sub-categorization – The phrase structure rule that expands the predicate's parent node in the parsing tree

First and last Word and their POS tags

Level – The level in the parsing tree


Predicate's verb class

Predicate POS tag

Predicate frequency

Predicate's context POS

Number of predicates


Parent, left sibling, and right sibling paths, constituent types, positions, head words, and head POS tags

Head of Prepositional Phrase (PP) parent – If the parent is a PP, then the head of this PP is also used as a feature


Predicate distance combination

Predicate phrase type combination

Head word and predicate combination

Voice position combination


Syntactic frame of predicate/NP

Headword suffixes of lengths 2, 3, and 4

Number of words in the phrase

Context words & POS tags

Tsai et al. BMC Bioinformatics 2007 8:325   doi:10.1186/1471-2105-8-325

Open Data