Log on / register
Feedback | Support | My details
Open AccessHighly AccessResearch article

The effect of oligonucleotide microarray data pre-processing on the analysis of patient-cohort studies

Roel GW Verhaak1 email, Frank JT Staal2 email, Peter JM Valk1 email, Bob Lowenberg1 email, Marcel JT Reinders3 email and Dick de Ridder2,3 email

Department of Hematology, Erasmus Medical Center, Rotterdam, The Netherlands

Department of Immunology, Erasmus Medical Center, Rotterdam, The Netherlands

Information and Communication Theory Group, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, the Netherlands

author email corresponding author email

BMC Bioinformatics 2006, 7:105doi:10.1186/1471-2105-7-105

Published: 2 March 2006

Abstract

Background

Intensity values measured by Affymetrix microarrays have to be both normalized, to be able to compare different microarrays by removing non-biological variation, and summarized, generating the final probe set expression values. Various pre-processing techniques, such as dChip, GCRMA, RMA and MAS have been developed for this purpose. This study assesses the effect of applying different pre-processing methods on the results of analyses of large Affymetrix datasets. By focusing on practical applications of microarray-based research, this study provides insight into the relevance of pre-processing procedures to biology-oriented researchers.

Results

Using two publicly available datasets, i.e., gene-expression data of 285 patients with Acute Myeloid Leukemia (AML, Affymetrix HG-U133A GeneChip) and 42 samples of tumor tissue of the embryonal central nervous system (CNS, Affymetrix HuGeneFL GeneChip), we tested the effect of the four pre-processing strategies mentioned above, on (1) expression level measurements, (2) detection of differential expression, (3) cluster analysis and (4) classification of samples. In most cases, the effect of pre-processing is relatively small compared to other choices made in an analysis for the AML dataset, but has a more profound effect on the outcome of the CNS dataset. Analyses on individual probe sets, such as testing for differential expression, are affected most; supervised, multivariate analyses such as classification are far less sensitive to pre-processing.

Conclusion

Using two experimental datasets, we show that the choice of pre-processing method is of relatively minor influence on the final analysis outcome of large microarray studies whereas it can have important effects on the results of a smaller study. The data source (platform, tissue homogeneity, RNA quality) is potentially of bigger importance than the choice of pre-processing method.


© 1999-2010 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.