Open Access Research article

Batch correction of microarray data substantially improves the identification of genes differentially expressed in Rheumatoid Arthritis and Osteoarthritis

Peter Kupfer1, Reinhard Guthke1, Dirk Pohlers23, Rene Huber24, Dirk Koczan5 and Raimund W Kinne2*

Author affiliations

1 Research Group Systems Biology/Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology – Hans Knöll Institute, Jena, Germany

2 Experimental Rheumatology Unit, Department of Orthopedics, University Hospital Jena, Friedrich Schiller University, Jena

3 Present address: Center of diagnostics GmbH, Chemnitz Hospital, Chemnitz, Germany

4 Present address: Institute of Clinical Chemistry, Hannover Medical School, Hannover, Germany

5 Institute of Immunology, University of Rostock, Rostock, Germany

For all author emails, please log on.

Citation and License

BMC Medical Genomics 2012, 5:23  doi:10.1186/1755-8794-5-23

Published: 8 June 2012



Batch effects due to sample preparation or array variation (type, charge, and/or platform) may influence the results of microarray experiments and thus mask and/or confound true biological differences. Of the published approaches for batch correction, the algorithm “Combating Batch Effects When Combining Batches of Gene Expression Microarray Data” (ComBat) appears to be most suitable for small sample sizes and multiple batches.


Synovial fibroblasts (SFB; purity > 98%) were obtained from rheumatoid arthritis (RA) and osteoarthritis (OA) patients (n = 6 each) and stimulated with TNF-α or TGF-β1 for 0, 1, 2, 4, or 12 hours. Gene expression was analyzed using Affymetrix Human Genome U133 Plus 2.0 chips, an alternative chip definition file, and normalization by Robust Multi-Array Analysis (RMA). Data were batch-corrected for different acquiry dates using ComBat and the efficacy of the correction was validated using hierarchical clustering.


In contrast to the hierarchical clustering dendrogram before batch correction, in which RA and OA patients clustered randomly, batch correction led to a clear separation of RA and OA. Strikingly, this applied not only to the 0 hour time point (i.e., before stimulation with TNF-α/TGF-β1), but also to all time points following stimulation except for the late 12 hour time point. Batch-corrected data then allowed the identification of differentially expressed genes discriminating between RA and OA. Batch correction only marginally modified the original data, as demonstrated by preservation of the main Gene Ontology (GO) categories of interest, and by minimally changed mean expression levels (maximal change 4.087%) or variances for all genes of interest. Eight genes from the GO category “extracellular matrix structural constituent” (5 different collagens, biglycan, and tubulointerstitial nephritis antigen-like 1) were differentially expressed between RA and OA (RA > OA), both constitutively at time point 0, and at all time points following stimulation with either TNF-α or TGF-β1.


Batch correction appears to be an extremely valuable tool to eliminate non-biological batch effects, and allows the identification of genes discriminating between different joint diseases. RA-SFB show an upregulated expression of extracellular matrix components, both constitutively following isolation from the synovial membrane and upon stimulation with disease-relevant cytokines or growth factors, suggesting an “imprinted” alteration of their phenotype.

Microarray analysis; Batch correction; Rheumatoid arthritis; Osteoarthritis; Collagen; Extracellular matrix