Table 1

Task 2 dataset description in numbers. The table shows the basic numbers referring to the task 2 training and test datasets. The full text articles of the training set were from the Journal of Biological Chemistry (JBC), Nature Medicine, Nature Genetics and Oncogene, while the test set articles were all from JBC.

Data set description
Training set
Test set 2.1
Test set 2.2
Data Type

Full text articles
803
113
99
free text
Total of GO annotation
2317
1076
1227
annotations
Number of proteins in the GO annotations
939
138
138
proteins
Number of GO terms used in the GO annotations
776
580
544
GO terms
Average number of annotations per protein
2.467
7.797
8.891
annotations
Annotations of Molecular Function GO terms
709
330
356
annotations
Annotations of Biological Process GO terms
1061
544
701
annotations
Annotations of Cellular Component GO terms
547
182
170
annotations
Molecular Function terms in the annotations
343
173
179
GO terms
Biological Process terms in the annotations
339
334
314
GO terms
Cellular Component terms in the annotations
94
57
51
GO terms

Blaschke et al. BMC Bioinformatics 2005 6(Suppl 1):S16   doi:10.1186/1471-2105-6-S1-S16