Additional file 4.

Summary of feature selection results. (A) Ten most important features for classifications regarding domain of life revealed by the feature selection algorithm of RF. Pairs of box-and-whisker plots are shown for each feature labeled with a number: 1-Gln content, 2-Leu content, 3-normalized frequency of extended structure, 4-negative charge, 5-average protein size in a proteome, 6-Glu content, 7-charge, 8-His content, 9-ratio of charged and non-charged amino acids, 10-Cys content. Box-and-whisker plots represent bacteria and archaea from top to bottom. (B) Ten most important features for classifications regarding halophilicity revealed by the feature selection algorithm of RF. Pairs of box-and-whisker plots are shown for each feature labeled with a number: 1-negative charge, 2-charge, 3-hydrophilicity value, 4-positive charge, 5-Gln content, 6-Glu content, 7-ratio of charged and non-charged amino acids, 8-normalized frequency of beta turn, 9-Asp content, 10-Phe content. Box-and-whisker plots represent non-halophiles and halophiles from top to bottom. (C) Ten most important features for classifications regarding thermophilicity revealed by the feature selection algorithm of RF. Triplets of box-and-whisker plots are shown for each feature labeled with a number: 1-Gln content, 2-information measure for loop, 3-Glu content, 4-Val content, 5-normalized frequency of extended structure, 6-hydrophilicity value, 7-Tyr content, 8-Asp content, 9-negative charge, 10-Chou-Fasman parameter of the coil conformation. Box-and-whisker plots represent mesophiles, mesothermophiles and thermophiles from top to bottom. In all plots feature values are normalized from 0 to 1 from left to right. (+) signs represent outliers.

Format: PDF Size: 3.7MB Download file

This file can be viewed with: Adobe Acrobat Reader

Smole et al. BMC Evolutionary Biology 2011 11:26   doi:10.1186/1471-2148-11-26