Table 1

Corpora



AIMed
BioInfer
HPRD50
IEPA
LLL


size
1955
1100
145
486
77

Entity
scope
human P/G
P/G/R and related
human P/G
Chemicals
P/G
coverage
all occurrences
all occurrences
NER system
list of 16 names
list of 116 names
types
no
111 types (ontology)
no
no
P/G

PPI
types
no
68 types (ontology)
no
no
3 types
binding
no
yes
no
yes
no
directed
no
yes
no
yes
yes
complex
no
yes
no
no
no
negative
no
yes
no
no
no
certainty
no
no
yes
no
no

Legend:

Size: Number of sentences in the corpus

Entity scope: Types of the named entities identified in the corpus: (P)rotein, (G)ene, (R)NA

Entity coverage: Coverage of in-scope entity occurrences in each sentence

Entity types: Explicit identification of the type of the annotated named entity occurrences

PPI types: Explicit indication of the type of the annotated interactions

PPI binding: Identification of the specific text spans that entail the annotated interactions

PPI directed: Specification of the directionality of the interaction (typically identification of agent vs. patient roles)

PPI complex: Annotation includes nested or n-ary (for n > 2) interactions

PPI negative: Annotation of negative interactions

PPI certainty: Annotation of the levels of certainty, or speculativeness, of interactions

Pyysalo et al. BMC Bioinformatics 2008 9(Suppl 3):S6   doi:10.1186/1471-2105-9-S3-S6