Table 1

Task 2 dataset description in numbers. The table shows the basic numbers referring to the task 2 training and test datasets. The full text articles of the training set were from the Journal of Biological Chemistry (JBC), Nature Medicine, Nature Genetics and Oncogene, while the test set articles were all from JBC.

Data set description

Training set

Test set 2.1

Test set 2.2

Data Type


Full text articles

803

113

99

free text

Total of GO annotation

2317

1076

1227

annotations

Number of proteins in the GO annotations

939

138

138

proteins

Number of GO terms used in the GO annotations

776

580

544

GO terms

Average number of annotations per protein

2.467

7.797

8.891

annotations

Annotations of Molecular Function GO terms

709

330

356

annotations

Annotations of Biological Process GO terms

1061

544

701

annotations

Annotations of Cellular Component GO terms

547

182

170

annotations

Molecular Function terms in the annotations

343

173

179

GO terms

Biological Process terms in the annotations

339

334

314

GO terms

Cellular Component terms in the annotations

94

57

51

GO terms


Blaschke et al. BMC Bioinformatics 2005 6(Suppl 1):S16   doi:10.1186/1471-2105-6-S1-S16

Open Data