Figure 5.

Feature importance for prediction of promoter usage level. A) Importance of features. Similarly to Figure 2A, we assessed the importance of the 117 features for the prediction of promoter usage level, based on three different promoter usage measurements: RNA-Seq and poised and running RNAPII. The importance (Y-axis) was measured by the influence of the feature on the mean square error. The prediction of RNA-Seq and RNAPII of the gene body density show similar patterns of importance where the downstream ChIP-Seq signals of the activating marks appears to be more informative. On the other hand, the acetylation marks seem to be most important in predicting the RNAPII recruitment, especially the first bin after the TSS, corresponding to the +1 nucleosome. B) Positional distributions of the four most informative histone marks broken up by expression. The promoter set was divided into 5 classes depending on mRNA expression using RNA-Seq data. Within each class, we normalized the counts for H3K4me2, H3K4me3, H3K9ac and H3K27ac in all bins to only retain the shape of the distributions. For the methylation marks, there is more variation in the distributions downstream of the TSS than upstream. The two acetylation marks have the highest relative signal around the position for the +1 nucleosome, while the methylations have high signals for ~5 downstream nucleosomes.

Chen et al. BMC Genomics 2011 12:544   doi:10.1186/1471-2164-12-544
