Profile analysis and prediction of tissue-specific CpG island methylation classes
1 Department of Molecular Biophysics, DKFZ, German Cancer Research Center, Heidelberg, Germany
2 Department of Computer Science and Artificial Intelligence, CITIC-UGR (Research Center on Information and Comunication Technology), University of Granada, Granada, 18071, Spain
3 Department of Molecular Microbiology, Howard Hughes Medical Institute, Washington University School of Medicine, St Louis, MO, USA
4 Computational Biology Unit, Bergen Center for Computational Science, Sars Centre for Marine Molecular Biology, University of Bergen, Bergen, Norway
BMC Bioinformatics 2009, 10:116 doi:10.1186/1471-2105-10-116Published: 21 April 2009
The computational prediction of DNA methylation has become an important topic in the recent years due to its role in the epigenetic control of normal and cancer-related processes. While previous prediction approaches focused merely on differences between methylated and unmethylated DNA sequences, recent experimental results have shown the presence of much more complex patterns of methylation across tissues and time in the human genome. These patterns are only partially described by a binary model of DNA methylation. In this work we propose a novel approach, based on profile analysis of tissue-specific methylation that uncovers significant differences in the sequences of CpG islands (CGIs) that predispose them to a tissue- specific methylation pattern.
We defined CGI methylation profiles that separate not only between constitutively methylated and unmethylated CGIs, but also identify CGIs showing a differential degree of methylation across tissues and cell-types or a lack of methylation exclusively in sperm. These profiles are clearly distinguished by a number of CGI attributes including their evolutionary conservation, their significance, as well as the evolutionary evidence of prior methylation. Additionally, we assess profile functionality with respect to the different compartments of protein coding genes and their possible use in the prediction of DNA methylation.
Our approach provides new insights into the biological features that determine if a CGI has a functional role in the epigenetic control of gene expression and the features associated with CGI methylation susceptibility. Moreover, we show that the ability to predict CGI methylation is based primarily on the quality of the biological information used and the relationships uncovered between different sources of knowledge. The strategy presented here is able to predict, besides the constitutively methylated and unmethylated classes, two more tissue specific methylation classes conserving the accuracy provided by leading binary methylation classification methods.