Open Access Highly Accessed Research article

Batch effect correction for genome-wide methylation data with Illumina Infinium platform

Zhifu Sun1, High Seng Chai1, Yanhong Wu2, Wendy M White3, Krishna V Donkena4, Christopher J Klein5, Vesna D Garovic6, Terry M Therneau1 and Jean-Pierre A Kocher1*

Author Affiliations

1 Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic College of Medicine, 200 First Street, Rochester, MN 55905, USA

2 Genomic Shared Resources, Mayo Clinic College of Medicine, 200 First Street, Rochester, MN 55905, USA

3 Division of Maternal Fetal Medicine, Department of Obstetrics & Gynecology, Mayo Clinic College of Medicine, 200 First Street, Rochester, MN 55905, USA

4 Department of Urology, Mayo Clinic College of Medicine, 200 First Street, Rochester, MN 55905, USA

5 Department of Neurology, Mayo Clinic College of Medicine, 200 First Street, Rochester, MN 55905, USA

6 Division of Nephrology and Hypertension, Department of Internal Medicine, Mayo Clinic College of Medicine, 200 First Street, Rochester, MN 55905, USA

For all author emails, please log on.

BMC Medical Genomics 2011, 4:84  doi:10.1186/1755-8794-4-84

Published: 16 December 2011

Additional files

Additional file 1:

Fitted lowess curves of M-A plot for Dataset 2 and 3. X-axis is for the methylation mean across all samples and Y-axis is the difference between each sample and the mean. Each curve represents a sample; red and green mark samples from two different batches. A: Dataset 2, red for Chip11 and green for Chip12. B: Dataset 3, red for Chip54 and green for Chip36. Bothe datasets show clear non-linear "intensity dependent" biases.

Format: PDF Size: 59KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 2:

Differential methylation p value distribution of positive and negative CpGs between prostate cancer and normal samples for Dataset 3 before and after normalization/batch correction. The positive CpGs (85) were selected from genes frequently reported in the literature whose CpGs are hypermethylated in prostate cancer. The negative CpGs (358) were selected for housekeeping genes. A: After normalization and normalization/EB correction, the numbers of differentially methylated positive CpGs all increase compared to un-normalized data. B: The p values for negative CpGs are almost uniformly distributed and there is no indication of bias introduced from normalization and batch correction (the significant CpGs at p < 0.05 are all less than expected 18).

Format: PDF Size: 34KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 3:

Batch effect correction by distance weighted discrimination (DWD). DWD effectively removes batch effects. However, the numbers of significant CpGs associated with outcome of study are all lower than EB corrected data in Table 1 of the main text. Failure to incorporate biological covariates in the adjustment model is likely to compromise true biological signals.

Format: PDF Size: 66KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data