Empirical Bayes accomodation of batch-effects in microarray data using identical replicate reference samples: application to RNA expression profiling of blood from Duchenne muscular dystrophy patients
1 Department of Neurology and MIND Institute, University of California at Davis, Sacramento, California, USA
2 Genome Center and Department of Statistics, University of California, Davis, CA, USA
3 Department of Pediatric Neurology, Cincinnati Children's Hospital and Medical Center, University of Cincinnati, Cincinnati, OH, USA
4 Department of Epidemiology and Biostatistics, School of Medicine, University of California, San Francisco, CA, USA
BMC Genomics 2008, 9:494 doi:10.1186/1471-2164-9-494Published: 20 October 2008
Non-biological experimental error routinely occurs in microarray data collected in different batches. It is often impossible to compare groups of samples from independent experiments because batch effects confound true gene expression differences. Existing methods can correct for batch effects only when samples from all biological groups are represented in every batch.
In this report we describe a generalized empirical Bayes approach to correct for cross-experimental batch effects, allowing direct comparisons of gene expression between biological groups from independent experiments. The proposed experimental design uses identical reference samples in each batch in every experiment. These reference samples are from the same tissue as the experimental samples. This design with tissue matched reference samples allows a gene-by-gene correction to be performed using fewer arrays than currently available methods. We examine the effects of non-biological variation within a single experiment and between experiments.
Batch correction has a significant impact on which genes are identified as differentially regulated. Using this method, gene expression in the blood of patients with Duchenne Muscular Dystrophy is shown to differ for hundreds of genes when compared to controls. The numbers of specific genes differ depending upon whether between experiment and/or between batch corrections are performed.