Highly predictive sequence features. (A) The distribution of F-scores of 1,364 sequence features for three representative histone marks (top 10 features are shown in boxes). A null distribution of F-score (Random) is based on same sequence features from 100 random sets of 1,000 TSS regions. (B) Highly predictive sequence features for H3K4me2, H3K27me3 and H3K9me3 and their SVM classification accuracy as single features. (C) H3K27me3-enriched regions are depleted for Alu retrotransposons. Alu has a much higher F-score than any other repetitive element and all short sequences. The consensus sequence of Alu with highly predictive sequences features highlighted.
Wang and Willard BMC Genomics 2012 13:367 doi:10.1186/1471-2164-13-367