Figure 1.

Structural and evolutionary features most predictive. Input features according to their cumulative contribution to performance measured by AUC, i.e. the area under the ROC curve (AUC* indicates that these values refer to results for a subset of the full cross-validation set). Our forward feature selection scheme suggested that three features raised performance above 0.8: evolutionary information (PSIC [31] diff), predicted secondary structure (from PROFsec [32,33]) around mutant (mutant position ± 8, i.e. 17 input units), and the PSI-BLAST information per residue for 21 consecutive residues. Additional six features only marginally increase performance up to mean AUC* ~0.84: predicted flexibility (PROFbval, w=21), difference in both PSI-BLAST PSSM (PSSM diff) and predicted secondary structure scores (PFOFsec diff), the fit of change position into a PFam domain (PFam fit, w=13), scores for predicted protein-protein interaction hotspots (ISIS, w=13) and residue volumes (VOLUME, w=5). High variability in AUC* distributions (long box plots, strong overlap between box plots) indicates instability in selected features.

Schaefer and Rost BMC Genomics 2012 13(Suppl 4):S4   doi:10.1186/1471-2164-13-S4-S4