Histogram of the z-score Eq. (8) for all individual attributes. The attributes with the highest score are marked. The figure is for the cell cycle dataset of Spellman et al.  using the same preprocessing as described in  (sec appendix]. After filtering ≈ 2500 GO attributes remained for evaluation. Repeating the analysis for all datasets given in Table 2 yields similar results. The clustering was obtained using a k-means algorithm with Euclidean distance and k = 25, the results do not change significantly for different choices of k (tested between k = 5 - 30, corresponding to the region where the z-score of the mutual information is largest ). Note that the top scoring attributes appear to be largely redundant, i.e. a gene that is annotated to the cellular component 'cytosolic ribosome' can be intuitively expected to be also annotated to the biological process 'protein biosynthesis'. See next section for details.
Steuer et al. BMC Bioinformatics 2006 7:380 doi:10.1186/1471-2105-7-380