Table 4 |
||||
|
The distribution of clusters with their characteristics given different values for k (the number of clusters) from 500 to 3,000. |
||||
|
K |
500 |
1,000 |
2,000 |
3,000 |
|
|
||||
|
Single Species cluster |
422 (84.4%) |
904 (90.4%) |
1897 (94.9%) |
2894 (96.5%) |
|
# of Phenocopy-Pairs (of 25) |
25 (100%) |
13 (52%) |
12 (48%) |
8 (32%) |
|
Cluster w/PT-Sim ≥ 0.4 |
92 (18.4%) |
293 (29.3%) |
526 (26.3%) |
810 (40.5%) |
|
# Genes |
3221 |
5886 |
6379 |
6878 |
|
Cluster w/GO-Sim ≥ 0.4 |
51 (10.2%) |
206 (20.6%) |
522 (26.1%) |
921 (46.1%) |
|
Correlation GO-Sim vs PT-SIM |
0.53 |
0.41 |
0.37 |
0.28 |
|
# Genes |
863 |
1800 |
2392 |
3065 |
|
Cluster w/PPi ≥ 75% |
21 (4.2%) |
60 (6.0%) |
174 (8.7%) |
305 (10.2%) |
|
# Genes |
1497 |
1858 |
2335 |
2702 |
|
Cluster w/PPi ≥ 33% |
63 (12.6%) |
138 (13.8%) |
286 (14.3%) |
413 (13.8%) |
|
# Genes |
3890 |
4322 |
4965 |
4996 |
|
Cluster for GO-Predictions |
90 (18%) |
196 (19.6%) |
393 (19.7%) |
611 (20.4%) |
|
# Genes |
2820 |
3213 |
4145 |
4546 |
|
# Terms |
142 |
345 |
730 |
1226 |
|
Precision |
72.55% |
67.91% |
63.40% |
60.31% |
|
Recall |
16.73% |
22.98% |
25.63% |
28.32% |
|
Avg. Genes/Cluster |
54 |
29 |
16 |
11 |
|
|
||||
|
As internal measure for cluster quality we sought to gain insight how the data structure changes by choosing different values for k, ranging from 500 to 3,000. Here, Filter 1 has been applied for GO-predictions. For details, see text. |
||||
|
Groth et al. BMC Bioinformatics 2008 9:136 doi:10.1186/1471-2105-9-136 |
||||