Distinguishing of RNAPII stalling/elongation state using chromatin signals. A) Correlations between the observed and predicted S index. The box plots summarize the variation estimated by 10 times cross-validation for both Random Forests and linear models. The regression was done on four different feature sets: 1) All 9 histone modifications as well as methylation status, dinucleotide content and normalized GC content 2) All 9 histone modifications 3) All 9 histone modifications the transcription factors cMyc and NELFe 4) only cMyc and NELFe. B) Feature importance in S index prediction. The importance of most marks for regression of stalling index is increasing in the first bin after the TSS: this is also seen in the regression of poised RNAPII in the promoter (Figure 2B). NELFe and cMYC (assessed as an overall signal within the promoter region) have substantial, yet lower predictive power compared to the chromatin data. C) Positional distributions of the four most informative histone marks broken up by S index. The promoter set was divided into 5 classes depending on S values. Within each class, we normalized the counts for H3K4me2, H3K4me3, H3K9ac and H3K27ac in all bins to only retain the shape of the distributions. The region that displays the most variance between the classes corresponds to the position of the first nucleosome downstream of the TSS.
Chen et al. BMC Genomics 2011 12:544 doi:10.1186/1471-2164-12-544