Table 2

Dataset dimensionalities

Relation type

Train instances

Test instances


Protein-Component (ST)

1689

334

Subunit-Complex (ST)

751

163


Equivalence (GENIA - E)

720

129

Functional (GENIA - E)

110

17

Locus (GENIA - E)

11

5

Member-Collection (GENIA - E)

5

0

Misc (GENIA - E)

53

11

Object-Variant (GENIA - E)

14

5

Out-of (GENIA - E)

40

7

Protein-Component (GENIA - E)

222

51

Subunit-Complex (GENIA - E)

108

22


Member-Collection (GENIA - NE)

760

181

Protein-Component (GENIA - NE)

593

174

Subunit-Complex (GENIA - NE)

275

82


Number of positive instances of the various types in the entity relation corpora. ST refers to the BioNLP'11 Shared Task data, while GENIA refers to the GENIA relation corpus. The latter corpus is further divided into embedded (E) and non-embedded (NE) cases. Datasets sufficiently large for classification analysis are in bold.

Van Landeghem et al. BMC Bioinformatics 2012 13(Suppl 11):S6   doi:10.1186/1471-2105-13-S11-S6

Open Data