Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Methodology article

Orthogonal projections to latent structures as a strategy for microarray data normalization

Max Bylesjö1*, Daniel Eriksson2, Andreas Sjödin3, Stefan Jansson3, Thomas Moritz2 and Johan Trygg1

Author Affiliations

1 Research group for Chemometrics, Department of Chemistry, Umeå University, SE-901 87 Umeå, Sweden

2 Umeå Plant Science Centre, Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, SE-901 83 Umeå, Sweden

3 Umeå Plant Science Centre, Department of Plant Physiology, Umeå University, SE-901 87 Umeå, Sweden

For all author emails, please log on.

BMC Bioinformatics 2007, 8:207  doi:10.1186/1471-2105-8-207

Published: 18 June 2007

Abstract

Background

During generation of microarray data, various forms of systematic biases are frequently introduced which limits accuracy and precision of the results. In order to properly estimate biological effects, these biases must be identified and discarded.

Results

We introduce a normalization strategy for multi-channel microarray data based on orthogonal projections to latent structures (OPLS); a multivariate regression method. The effect of applying the normalization methodology on single-channel Affymetrix data as well as dual-channel cDNA data is illustrated. We provide a parallel comparison to a wide range of commonly employed normalization methods with diverse properties and strengths based on sensitivity and specificity from external (spike-in) controls. On the illustrated data sets, the OPLS normalization strategy exhibits leading average true negative and true positive rates in comparison to other evaluated methods.

Conclusion

The OPLS methodology identifies joint variation within biological samples to enable the removal of sources of variation that are non-correlated (orthogonal) to the within-sample variation. This ensures that structured variation related to the underlying biological samples is separated from the remaining, bias-related sources of systematic variation. As a consequence, the methodology does not require any explicit knowledge regarding the presence or characteristics of certain biases. Furthermore, there is no underlying assumption that the majority of elements should be non-differentially expressed, making it applicable to specialized boutique arrays.