Identifying differentially methylated genes using mixed effect and generalized least square models
-
* Corresponding author: Shuying Sun shuying.sun@case.edu
1 Case Comprehensive Cancer Center, Case Western Reserve University, Cleveland, Ohio, 44106, USA
2 Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio, 44106, USA
3 Human Cancer Genetics Program, The Ohio State University, Columbus, Ohio, 43210, USA
4 Department of Statistics, The Ohio State University, Columbus, Ohio, 43210, USA
BMC Bioinformatics 2009, 10:404 doi:10.1186/1471-2105-10-404
Published: 9 December 2009Abstract
Background
DNA methylation plays an important role in the process of tumorigenesis. Identifying differentially methylated genes or CpG islands (CGIs) associated with genes between two tumor subtypes is thus an important biological question. The methylation status of all CGIs in the whole genome can be assayed with differential methylation hybridization (DMH) microarrays. However, patient samples or cell lines are heterogeneous, so their methylation pattern may be very different. In addition, neighboring probes at each CGI are correlated. How these factors affect the analysis of DMH data is unknown.
Results
We propose a new method for identifying differentially methylated (DM) genes by identifying the associated DM CGI(s). At each CGI, we implement four different mixed effect and generalized least square models to identify DM genes between two groups. We compare four models with a simple least square regression model to study the impact of incorporating random effects and correlations.
Conclusions
We demonstrate that the inclusion (or exclusion) of random effects and the choice of correlation structures can significantly affect the results of the data analysis. We also assess the false discovery rate of different models using CGIs associated with housekeeping genes.