Table 1

Training and test dataset: Datasets used for training and testing the support vector machines. The columns are: 1. The number of cDNA sequences for training. 2. The number of cDNA sequences with BLAST hits having GO molecular function terms. 3. The average number of GO molecular function terms per cDNA sequence of the BLAST-hits. 4. The classification of GO terms coming from the hits, positive if the GO terms were similar to original annotation, negative otherwise.

Organisms
Number of cDNAs
cDNA with MF GO
Number of GO/cDNA
Class distribution




% Positive
% Negative

Rat
1039
1036
36.90
25.7
74.3
Fish
1061
1044
32.10
39.2
60.8
Fly
5840
5574
25.47
23.4
76.6
Worm
4272
3458
27.13
39.5
60.5
Plasmodium
274
271
23.67
28.0
72.0
Leishmania
82
82
20.51
35.1
64.9
Yeast
3356
2972
18.60
23.7
76.3
Bacillus
2729
2577
13.63
35.4
64.6
Coxiella
931
900
12.33
37.0
63.0
Shewanella
2413
2303
10.78
33.0
67.0
Vibrio
1832
1804
12.54
31.9
68.1
Arabidopsis
8807
8120
26.66
30.2
69.8

Vinayagam et al. BMC Bioinformatics 2004 5:116   doi:10.1186/1471-2105-5-116