Stepwise selection of HIV-1 protease positions and mutations that are important for cluster assignment. (a) Stepwise selection of HIV-1 protease positions (from left to right) such that at each step, the mutual information between amino acids at positions selected so far and the cluster assignment is maximized. Red bars indicate (bias-adjusted) MI between each individual position and cluster assignment. ‘x’s’ are the standard deviation of MI estimation for two independent variables, and asterisks indicate the threshold for statistical significance (p=0.01). Blue bars are the estimated joint MI between the subset chosen and the cluster assignment. The black bar indicates the total information content of the cluster assignments. (b) Stepwise selection of the most informative amino acid identities at specific positions for assignment into phenotypic clusters. (c) and (d): Stepwise selection of particular amino acid identities whose collective presence or absence are maximally informative of membership specifically into cluster 1 (c) and cluster 36 (d).
Doherty et al. BMC Bioinformatics 2011 12:477 doi:10.1186/1471-2105-12-477