|
Resolution: standard / high Figure 2.
Histogram of the z-score Eq. (8) for all individual attributes. The attributes with
the highest score are marked. The figure is for the cell cycle dataset of Spellman
et al. [24] using the same preprocessing as described in [13] (sec appendix]. After
filtering ≈ 2500 GO attributes remained for evaluation. Repeating the analysis for
all datasets given in Table 2 yields similar results. The clustering was obtained
using a k-means algorithm with Euclidean distance and k = 25, the results do not change significantly for different choices of k (tested between k = 5 - 30, corresponding to the region where the z-score of the mutual information
is largest [13]). Note that the top scoring attributes appear to be largely redundant,
i.e. a gene that is annotated to the cellular component 'cytosolic ribosome' can be
intuitively expected to be also annotated to the biological process 'protein biosynthesis'.
See next section for details.
Steuer et al. BMC Bioinformatics 2006 7:380 doi:10.1186/1471-2105-7-380 |