|
Training and test dataset: Datasets used for training and testing the support vector machines. The columns are: 1. The number of cDNA sequences for training. 2. The number of cDNA sequences with BLAST hits having GO molecular function terms. 3. The average number of GO molecular function terms per cDNA sequence of the BLAST-hits. 4. The classification of GO terms coming from the hits, positive if the GO terms were similar to original annotation, negative otherwise. |
|||||
| Organisms |
Number of cDNAs |
cDNA with MF GO |
Number of GO/cDNA |
Class distribution |
|
| % Positive |
% Negative |
||||
|
|
|||||
| Rat |
1039 |
1036 |
36.90 |
25.7 |
74.3 |
| Fish |
1061 |
1044 |
32.10 |
39.2 |
60.8 |
| Fly |
5840 |
5574 |
25.47 |
23.4 |
76.6 |
| Worm |
4272 |
3458 |
27.13 |
39.5 |
60.5 |
| Plasmodium |
274 |
271 |
23.67 |
28.0 |
72.0 |
| Leishmania |
82 |
82 |
20.51 |
35.1 |
64.9 |
| Yeast |
3356 |
2972 |
18.60 |
23.7 |
76.3 |
| Bacillus |
2729 |
2577 |
13.63 |
35.4 |
64.6 |
| Coxiella |
931 |
900 |
12.33 |
37.0 |
63.0 |
| Shewanella |
2413 |
2303 |
10.78 |
33.0 |
67.0 |
| Vibrio |
1832 |
1804 |
12.54 |
31.9 |
68.1 |
| Arabidopsis |
8807 |
8120 |
26.66 |
30.2 |
69.8 |
Vinayagam et al. BMC Bioinformatics 2004 5:116 doi:10.1186/1471-2105-5-116 |
|||||