Open Access Highly Accessed Open Badges Research article

Identifying hypermethylated CpG islands using a quantile regression model

Shuying Sun12*, Zhengyi Chen3, Pearlly S Yan4, Yi-Wen Huang4, Tim HM Huang4 and Shili Lin5

Author Affiliations

1 Case Comprehensive Cancer Center, Case Western Reserve University, Cleveland, Ohio, USA

2 Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio, USA

3 Department of Statistics, Case Western Reserve University, Cleveland, Ohio, USA

4 Human Cancer Genetics Program, The Ohio State University, Columbus, Ohio, USA

5 Department of Statistics, The Ohio State University, Columbus, Ohio, USA

For all author emails, please log on.

BMC Bioinformatics 2011, 12:54  doi:10.1186/1471-2105-12-54

Published: 15 February 2011



DNA methylation has been shown to play an important role in the silencing of tumor suppressor genes in various tumor types. In order to have a system-wide understanding of the methylation changes that occur in tumors, we have developed a differential methylation hybridization (DMH) protocol that can simultaneously assay the methylation status of all known CpG islands (CGIs) using microarray technologies. A large percentage of signals obtained from microarrays can be attributed to various measurable and unmeasurable confounding factors unrelated to the biological question at hand. In order to correct the bias due to noise, we first implemented a quantile regression model, with a quantile level equal to 75%, to identify hypermethylated CGIs in an earlier work. As a proof of concept, we applied this model to methylation microarray data generated from breast cancer cell lines. However, we were unsure whether 75% was the best quantile level for identifying hypermethylated CGIs. In this paper, we attempt to determine which quantile level should be used to identify hypermethylated CGIs and their associated genes.


We introduce three statistical measurements to compare the performance of the proposed quantile regression model at different quantile levels (95%, 90%, 85%, 80%, 75%, 70%, 65%, 60%), using known methylated genes and unmethylated housekeeping genes reported in breast cancer cell lines and ovarian cancer patients. Our results show that the quantile levels ranging from 80% to 90% are better at identifying known methylated and unmethylated genes.


In this paper, we propose to use a quantile regression model to identify hypermethylated CGIs by incorporating probe effects to account for noise due to unmeasurable factors. Our model can efficiently identify hypermethylated CGIs in both breast and ovarian cancer data.