Log on / register
Feedback | Support | My details
Open AccessMethodology article

The effects of normalization on the correlation structure of microarray data

Xing Qiu1 email, Andrew I Brooks2 email, Lev Klebanov1,3 email and Andrei Yakovlev1 email

1Department of Biostatistics and Computational Biology, University of Rochester, New York 14642, USA

2Functional Genomics Center, University of Rochester, 601 Elmwood Avenue, Rochester, New York 14642, USA

3Department of Probability and Statistics, Charles University, Sokolovska 83, Praha-8, CZ-18675, Czech Republic

author email corresponding author email

BMC Bioinformatics 2005, 6:120doi:10.1186/1471-2105-6-120

Published: 16 May 2005

Abstract

Background

Stochastic dependence between gene expression levels in microarray data is of critical importance for the methods of statistical inference that resort to pooling test-statistics across genes. It is frequently assumed that dependence between genes (or tests) is suffciently weak to justify the proposed methods of testing for differentially expressed genes. A potential impact of between-gene correlations on the performance of such methods has yet to be explored.

Results

The paper presents a systematic study of correlation between the t-statistics associated with different genes. We report the effects of four different normalization methods using a large set of microarray data on childhood leukemia in addition to several sets of simulated data. Our findings help decipher the correlation structure of microarray data before and after the application of normalization procedures.

Conclusion

A long-range correlation in microarray data manifests itself in thousands of genes that are heavily correlated with a given gene in terms of the associated t-statistics. By using normalization methods it is possible to significantly reduce correlation between the t-statistics computed for different genes. Normalization procedures affect both the true correlation, stemming from gene interactions, and the spurious correlation induced by random noise. When analyzing real world biological data sets, normalization procedures are unable to completely remove correlation between the test statistics. The long-range correlation structure also persists in normalized data.


© 1999-2008 BioMed Central Ltd unless otherwise stated