Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Research article

Histone modification profiles are predictive for tissue/cell-type specific expression of both protein-coding and microRNA genes

Zhihua Zhang and Michael Q Zhang*

Author Affiliations

Department of Molecular Cell Biology, Center for Systems Biology, University of Texas at Dallas, 800 W Campbell Road, Richardson, TX 75080, USA

For all author emails, please log on.

BMC Bioinformatics 2011, 12:155  doi:10.1186/1471-2105-12-155

Published: 14 May 2011

Additional files

Additional file 1:

The distribution of gene expressions across tissues. The names of tissues are the same as those shown in the GNF symAtlas dataset. A,B,C) show the distributions for CD4SE, HK, and randomly chosen genes, respectively.

Format: PDF Size: 305KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 2:

The screeplot of the principal component analysis. The screeplot of principal component analysis, where the x-axis gives the index of each principal components, and the y-axis gives the proportion of variance.

Format: PDF Size: 82KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 3:

Top selected HMV features by the TCSR model in gene bodies. Top selected HMV features by the TCSR model in gene bodies. The x-axis shows the number of times in which an HMV feature has been selected as the top predictive feature in 100 replicates. A, B) The HMV features selected from Set I for CpG and nonCpG genes, respectively; C, D) features selected from Set II.

Format: PDF Size: 100KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 4:

Top bi-combinations of selected HMVs features by the TCSR model. Top bi-combinations of selected HMVs features by the TCSR model. The x-axis shows the total number of times in which the two HMV types have been selected as the first and the second most predictive feature in 100 replicates, irrespective of nucleosomes index of the HMV features. The y-axis indicates the combinations of two HMV types in the "first_second" order. A, B) The combinations selected from Set I for CpG- and nonCpG-related promoters, respectively; C, D) combinations selected from Set II for CpG- and nonCpG-related promoters, respectively; E, F) The combinations selected from Set I for CpG- and nonCpG-related gene bodies, respectively; G, H) combinations selected from Set II for CpG- and nonCpG-related gene bodies, respectively.

Format: PDF Size: 298KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 5:

Top selected HMV features by the TCSR model. Top selected HMV features by the TCSR model. The x-axis shows the number of times in which a HMV feature has been selected as the top two predictive HMV features in 100 replicates. p and m, stand for "+" and "-" strands, respectively, followed by an index of the nucleosome, either in downstream or upstream of TSS. "avg" means the average tag number of the HMV type; "body", "1stExon", and "1stIntron" means that the calculation was performed in the entire gene body region, the first exon, and the first intron region, respectively. A, B) HMVs were selected from Set I for CpG- and nonCpG-related promoters, respectively; C, D) HMVs were selected from Set II for CpG- and nonCpG-related promoters, respectively; E, F) HMVs were selected from Set I for CpG- and nonCpG-related gene bodies, respectively; C, D) HMVs were selected from Set II for CpG- and nonCpG-related gene bodies, respectively.

Format: PDF Size: 297KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 6:

The distribution of gene expression across tissues. The distribution of gene expression across tissues. The name of tissues are the same as shown in the GNF symAtlas dataset. A) The TCSR model predicted CD4+ T cell specific genes. B) Predicted highly expressed genes in CD4+ T cells based on the gene expression activity model of Karlic et al.

Format: PDF Size: 243KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 7:

The HMV types highly correlated with enhancer marker H3K4me1. All HMVs types with a Pearson's correlation coefficient compared with H3K4me1 higher than 0.2 in CD4SE genes are listed in here. The HMV names in bold font indicate they have been selected as a top predictive feature by CoreBoost at least once in 100 replicates.

Format: DOC Size: 24KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional file 8:

The performance of classifiers. The classifiers were trained by the HMV profile subset of CD4+ T cell, in which H3K4me3, H3K4me2, H3K79me3, and H3K27ac were not included. Averages and errors are given as the mean and standard deviation, respectively, from 100 replicates. The performances were measured by applying the classifiers on protein-coding genes in CD4+ T cells. The significances of comparison between the performance of CoreBoost trained on features in those regions and control regions are indicated by symbols next to each number No symbol indicates p-value < 1e-5; * indicates p-value < 1e-2 and > = 1e-5; indicates p-value > 1e-2.

Format: DOC Size: 23KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional file 9:

Gene name list. This file lists the CD4+ T cell specific and housekeeping protein-coding genes and miRNA genes.

Format: DOC Size: 933KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data