BMC Bioinformatics

official impact factor 3.03

This article is part of the supplement: Selected papers from the Seventh Asia-Pacific Bioinformatics Conference (APBC 2009)

Open Access Research

Comparison of Affymetrix data normalization methods using 6,926 experiments across five array generations

Reija Autio1*, Sami Kilpinen2,3, Matti Saarela1, Olli Kallioniemi2,3, Sampsa Hautaniemi4 and Jaakko Astola1

Author Affiliations

1 Department of Signal Processing, Tampere University of Technology, Tampere, Finland

2 Medical Biotechnology, VTT Technical Research Centre and University of Turku, Turku, Finland

3 Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland

4 Computational Systems Biology Laboratory, Institute of Biomedicine and Genome-Scale Biology Research Program, University of Helsinki, Helsinki, Finland

For all author emails, please log on.

BMC Bioinformatics 2009, 10(Suppl 1):S24 doi:10.1186/1471-2105-10-S1-S24

Published: 30 January 2009

Abstract

Background

Gene expression microarray technologies are widely used across most areas of biological and medical research. Comparing and integrating microarray data from different experiments would be very useful, but is currently very challenging due to the experimental and hybridization conditions, as well as data preprocessing and normalization methods. Furthermore, even in the case of the widely-used, industry-standard Affymetrix oligonucleotide microarrays, the various array generations have different probe sets representing different genes, hindering the data integration.

Results

In this study our objective is to find systematic approaches to normalize the data emerging from different Affymetrix array generations and from different laboratories. We compare and assess the accuracy of five normalization methods for Affymetrix gene expression data using 6,926 Affymetrix experiments from five array generations. The methods that we compare include 1) standardization, 2) housekeeping gene based normalization, 3) equalized quantile normalization, 4) Weibull distribution based normalization and 5) array generation based gene centering. Our results indicate that the best results are achieved when the data is normalized first within a sample and then between-samples with Array Generation based gene Centering (AGC) normalization.

Conclusion

We conclude that with the AGC method integrating different Affymetrix datasets results in values that are significantly more comparable across the array generations than in the cases where no array generation based normalization is used. The AGC method was found to be the best method for normalizing the data from several different array generations, and achieve comparable gene values across thousands of samples.