Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Methodology article

Regression hidden Markov modeling reveals heterogeneous gene expression regulation: a case study in mouse embryonic stem cells

Yeonok Lee1, Debashis Ghosh12* and Yu Zhang1*

Author Affiliations

1 Department of Statistics, Penn State University, University Park, PA 16802, USA

2 Department of Public Health Sciences, Penn State University, University Park, PA 16802, USA

For all author emails, please log on.

BMC Genomics 2014, 15:360  doi:10.1186/1471-2164-15-360

Published: 12 May 2014



Studies have shown the strong association between histone modification levels and gene expression levels. The detailed relationships between the two can vary substantially due to differential regulation, and hence a simple regression model may not be adequate. We apply a regression hidden Markov model (regHMM) to further investigate the potential multiple relationships between genes and histone methylation levels in mouse embryonic stem cells.


Seven histone methylation levels are used in the study. Averaged histone modifications over non-overlapping 200 bp windows on the range transcription starting site (TSS) ± 1 Kb are used as predictors, and in total 70 explanatory variables are generated. Based on regHMM results, genes segregated into two groups, referred to as State 1 and State 2, have distinct association strengths. Genes in State 1 are better explained by histone methylation levels with R2=.72 while those in State 2 have weaker association strength with R2=.38. The regression coefficients in the two states are not very different in magnitude except in the intercept,.25 and 1.15 for State 1 and State 2, respectively. We found specific GO categories that may be attributed to the different relationships. The GO categories more frequently observed in State 2 match those of housekeeping genes, such as cytoplasm, nucleus, and protein binding. In addition, the housekeeping gene expression levels are significantly less explained by histone methylation in mouse embryonic stem cells, which is consistent with the constitutive expression patterns that would be expected.


Gene expression levels are not universally affected by histone methylation levels, and the relationships between the two differ by the gene functions. The expression levels of the genes that perform the most common housekeeping genes’ GO categories are less strongly associated with histone methylation levels. We suspect that additional biological factors may also be strongly associated with the gene expression levels in State 2. We discover that the effect of the presence of CpG island in TSS ± 1 Kb is larger in State 2.

Regression hidden Markov model; Histone modification; Gene expression level; Mouse embryonic stem cell