|
Task 2 dataset description in numbers. The table shows the basic numbers referring to the task 2 training and test datasets. The full text articles of the training set were from the Journal of Biological Chemistry (JBC), Nature Medicine, Nature Genetics and Oncogene, while the test set articles were all from JBC. |
||||
| Data set description |
Training set |
Test set 2.1 |
Test set 2.2 |
Data Type |
|
|
||||
| Full text articles |
803 |
113 |
99 |
free text |
| Total of GO annotation |
2317 |
1076 |
1227 |
annotations |
| Number of proteins in the GO annotations |
939 |
138 |
138 |
proteins |
| Number of GO terms used in the GO annotations |
776 |
580 |
544 |
GO terms |
| Average number of annotations per protein |
2.467 |
7.797 |
8.891 |
annotations |
| Annotations of Molecular Function GO terms |
709 |
330 |
356 |
annotations |
| Annotations of Biological Process GO terms |
1061 |
544 |
701 |
annotations |
| Annotations of Cellular Component GO terms |
547 |
182 |
170 |
annotations |
| Molecular Function terms in the annotations |
343 |
173 |
179 |
GO terms |
| Biological Process terms in the annotations |
339 |
334 |
314 |
GO terms |
| Cellular Component terms in the annotations |
94 |
57 |
51 |
GO terms |
Blaschke et al. BMC Bioinformatics 2005 6(Suppl 1):S16 doi:10.1186/1471-2105-6-S1-S16 |
||||