Measuring and predicting promoter usage. A) Conceptual framework for prediction of promoter activity. The +/-975 region around annotated promoters was divided into 13 bins where the center bin was centered at the TSS. Within each such bin we count the number of tags per million of available ChIP-seq data, corresponding to enrichment signals of various chromatin marks or other data. These 13 long feature vectors are the input into a Random Forest or linear model to either classify two different sets of promoters from each other or to predict the usage of the promoters measured by various experimental methods (Panel B). B) Illustration of different ways of measuring promoter usage. We measure promoter usage in three different ways: i) mRNA expression by the sum of RNA-Seq tags in the first 1, 000 exonic region downstream to TSS, excluding the tags from the introns; ii) RNAPII recruitment by the sum of RNAPII ChIP tags the region around the TSS (-300~+300) and iii) RNAPII elongation by the sum of RNAPII tags in the gene body (+300~+1, 000).
Chen et al. BMC Genomics 2011 12:544 doi:10.1186/1471-2164-12-544