|
Resolution: standard / high Figure 1.
Structural and evolutionary features most predictive. Input features according to their cumulative contribution to performance measured
by AUC, i.e. the area under the ROC curve (AUC* indicates that these values refer
to results for a subset of the full cross-validation set). Our forward feature selection
scheme suggested that three features raised performance above 0.8: evolutionary information
(PSIC [31] diff), predicted secondary structure (from PROFsec [32,33]) around mutant (mutant position ± 8, i.e. 17 input units), and the PSI-BLAST information
per residue for 21 consecutive residues. Additional six features only marginally increase
performance up to mean AUC* ~0.84: predicted flexibility (PROFbval, w=21), difference
in both PSI-BLAST PSSM (PSSM diff) and predicted secondary structure scores (PFOFsec
diff), the fit of change position into a PFam domain (PFam fit, w=13), scores for
predicted protein-protein interaction hotspots (ISIS, w=13) and residue volumes (VOLUME,
w=5). High variability in AUC* distributions (long box plots, strong overlap between
box plots) indicates instability in selected features.
Schaefer and Rost BMC Genomics 2012 13(Suppl 4):S4 doi:10.1186/1471-2164-13-S4-S4 |