|
Resolution: standard / high Figure 1.
Validating clustering results by the mutual information: A schematic example. Each
gene is uniquely assigned to one functional category Ai and grouped into cluster Cj by a given clustering algorithm. The joint probabilities can be straightforwardly
estimated from the associated contingency table and the mutual information is calculated
according to Eq. (1). To assess how related the clustering is to the annotation, the
value of the mutual information is compared to random assignments of genes to cluster
number, i.e. each gene is randomly assigned to a cluster, preserving the total number
of genes within each cluster, but destroying all possible relationship between the
clustering and the functional annotation. The lower right plot shows the mutual information,
compared to an ensemble of 500 randomized assignments, In this example, the z-score,
estimated according to Eq. (8), is S ≈ 3.8. For a z-score to be deemed significant, we further require that no random assignment
results in a mutual information equal or larger that the tested annotation. Note that,
though we expect the mutual information to be zero for the randomized assignments,
the average estimated mutual information for randomized data has a bias towards positive
values due to finite-size effects [19,20]. As a rule of thumb, to obtain reliable
estimate of the mutual information the number of genes should be at least three times
larger than the number of clusters or functional categories [20].
Steuer et al. BMC Bioinformatics 2006 7:380 doi:10.1186/1471-2105-7-380 |