Combinatorial epigenetic patterns as quantitative predictors of chromatin biology
Department of Biochemistry and Molecular Genetics, University of Virginia Health System, Charlottesville, Virginia, USA
BMC Genomics 2014, 15:76 doi:10.1186/1471-2164-15-76Published: 28 January 2014
Chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) is the most widely used method for characterizing the epigenetic states of chromatin on a genomic scale. With the recent availability of large genome-wide data sets, often comprising several epigenetic marks, novel approaches are required to explore functionally relevant interactions between histone modifications. Computational discovery of "chromatin states" defined by such combinatorial interactions enabled descriptive annotations of genomes, but more quantitative approaches are needed to progress towards predictive models.
We propose non-negative matrix factorization (NMF) as a new unsupervised method to discover combinatorial patterns of epigenetic marks that frequently co-occur in subsets of genomic regions. We show that this small set of combinatorial "codes" can be effectively displayed and interpreted. NMF codes enable dimensionality reduction and have desirable statistical properties for regression and classification tasks. We demonstrate the utility of codes in the quantitative prediction of Pol2-binding and the discrimination between Pol2-bound promoters and enhancers. Finally, we show that specific codes can be linked to molecular pathways and targets of pluripotency genes during differentiation.
We have introduced and evaluated a new computational approach to represent combinatorial patterns of epigenetic marks as quantitative variables suitable for predictive modeling and supervised machine learning. To foster widespread adoption of this method we make it available as an open-source software-package – epicode at https://github.com/mcieslik-mctp/epicode webcite.