Feature importance for promoter classification. We show the importance of each histone mark and positional bin for the classification between active promoters, silent promoters and randomly selected non-promoter regions. The importance is computed as the mean decrease in accuracy as defined by the Random Forest method, where high values indicate high importance for a particular feature in the prediction. The X-axis denotes the entire set of 117 input features, consisting of 13 bins per epigenetic mark where the middle bin corresponds to the 150 nt region around the TSS (See Figure 1A). The results show the high influence by H3K4me2, H3K4me3 and H3K9ac in distinguishing the active promoters.
Chen et al. BMC Genomics 2011 12:544 doi:10.1186/1471-2164-12-544