Open Access Highly Accessed Open Badges Research article

Sources of variation in Affymetrix microarray experiments

Stanislav O Zakharkin1, Kyoungmi Kim1, Tapan Mehta1, Lang Chen1, Stephen Barnes2, Katherine E Scheirer3, Rudolph S Parrish4, David B Allison1 and Grier P Page1*

Author affiliations

1 Section on Statistical Genetics, University of Alabama at Birmingham, Birmingham, Alabama, USA

2 Departments of Pharmacology and Toxicology, University of Alabama at Birmingham, Birmingham, Alabama, USA

3 Heflin Center for Human Genetics, University of Alabama at Birmingham, Birmingham, Alabama, USA

4 Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, Kentucky, USA

For all author emails, please log on.

Citation and License

BMC Bioinformatics 2005, 6:214  doi:10.1186/1471-2105-6-214

Published: 29 August 2005



A typical microarray experiment has many sources of variation which can be attributed to biological and technical causes. Identifying sources of variation and assessing their magnitude, among other factors, are important for optimal experimental design. The objectives of this study were: (1) to estimate relative magnitudes of different sources of variation and (2) to evaluate agreement between biological and technical replicates.


We performed a microarray experiment using a total of 24 Affymetrix GeneChip® arrays. The study included 4th mammary gland samples from eight 21-day-old Sprague Dawley CD female rats exposed to genistein (soy isoflavone). RNA samples from each rat were split to assess variation arising at labeling and hybridization steps. A general linear model was used to estimate variance components. Pearson correlations were computed to evaluate agreement between technical and biological replicates.


The greatest source of variation was biological variation, followed by residual error, and finally variation due to labeling when *.cel files were processed with dChip and RMA image processing algorithms. When MAS 5.0 or GCRMA-EB were used, the greatest source of variation was residual error, followed by biology and labeling. Correlations between technical replicates were consistently higher than between biological replicates.