Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Highly Accessed Research article

Evidence for sequence biases associated with patterns of histone methylation

Zhong Wang12 and Huntington F Willard1*

Author Affiliations

1 Genome Biology Group, Duke Institute for Genome Sciences & Policy, Duke University, 101 Science Dr. CIEMAS 2376, Durham, NC, 27708, USA

2 DOE Joint Genome Institute, Walnut Creek, CA, 94598, USA

For all author emails, please log on.

BMC Genomics 2012, 13:367  doi:10.1186/1471-2164-13-367

Published: 2 August 2012

Additional files

Additional file 1:

Table S1. Summary of selected regions enriched/depletedfor histone marks from human CD4+T-cells.

Format: XLS Size: 15KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 2:

Table S2. Number of samples used for SVM training/testing in human T-cells.

Format: XLS Size: 19KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 3:

Table S3. SVM classification for histone marks in human T-cells.

Format: XLS Size: 37KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 4:

Figure S1,S2,S3 and S4. Figure S1. Genome-wide predicted locations of H3K4me2, H3K27me3, and H3K9me3 correlate with experimentally determined profiles in the human CD4 T-cells. Figure S2. Genome-wide predicted locations of H3K4me2, H3K27me3, and H3K9me3 correlate with experimentally determined pro_les in the human CD4 T-cells. Only data from chr10 is shown as an example since plots obtained from the rest of the chromosomes look almost identical as chr10. Each data point corresponds to the experimentally deterimined modi_ed histone enrichment level (x-axis) in a 2.5kb region and the prediction probability by SVM models (y-axis). Enrichment level 6 stands for >2^6 (64 reads per kb), 5 stands for 2^5-2^6, or (52-64), and so on. Red bars in each boxplot indicate median values, and red pluses indicate outliers. As enrichment levels go down, the number of regions predicated to be enriched also go down. Figure S3. Cluster analysis of regions occupied by different epigenetic marks. The hierarchical cluster of histone marks in (a) TSS regions and (b) non-genic regions, based on dissimilarities in their occupied genomic- sequence (measured by SVM misclassification rates). Figure S4. Sequence permutations and their e_ects on classi_cation. Prediction accuracy of SVM models (trained with original sequences, circles) for singlet (triangles), doublet (diamonds) or CpG (squares) permuted sequences. Sensitivity represents the ability to predict enriched regions, and speci_city for depleted regions of a particular methylated histone mark.

Format: PDF Size: 3.1MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 5:

Table S4. Predictions between epignetic marks using SVM models with high cross-validation accuracy(>75%).

Format: XLS Size: 27KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 6:

Table S5. Features with consistently high F-scores in multiple rounds of classifications, TSS regions.

Format: XLS Size: 199KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 7:

Table S6. SVM classification on ENCODE cell lines for H3K9me3, H3K27me3, H3K4me2.

Format: XLSX Size: 53KB Download file

Open Data