Open Access Highly Accessed Open Badges Research article

Comparison of normalization methods for CodeLink Bioarray data

Wei Wu1*, Nilesh Dave1, George C Tseng2, Thomas Richards1, Eric P Xing3 and Naftali Kaminski1

Author Affiliations

1 Dorothy P. and Richard P. Simmons Center for Interstitial Lung Disease, Division of Pulmonary, Allergy and Critical Care Medicine, University of Pittsburgh Medical Center, Pittsburgh, PA 15213, USA

2 Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261, USA

3 Center for Automated Learning and Discovery and Language Technology Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA

For all author emails, please log on.

BMC Bioinformatics 2005, 6:309  doi:10.1186/1471-2105-6-309

Published: 28 December 2005



The quality of microarray data can seriously affect the accuracy of downstream analyses. In order to reduce variability and enhance signal reproducibility in these data, many normalization methods have been proposed and evaluated, most of which are for data obtained from cDNA microarrays and Affymetrix GeneChips. CodeLink Bioarrays are a newly emerged, single-color oligonucleotide microarray platform. To date, there are no reported studies that evaluate normalization methods for CodeLink Bioarrays.


We compared five existing normalization approaches, in terms of both noise reduction and signal retention: Median (suggested by the manufacturer), CyclicLoess, Quantile, Iset, and Qspline. These methods were applied to two real datasets (a time course dataset and a lung disease-related dataset) generated by CodeLink Bioarrays and were assessed using multiple statistical significance tests. Compared to Median, CyclicLoess and Qspline exhibit a significant and the most consistent improvement in reduction of variability and retention of signal. CyclicLoess appears to retain more signal than Qspline. Quantile reduces more variability than Median in both datasets, yet fails to consistently retain more signal in the time course dataset. Iset does not improve over Median in either noise reduction or signal enhancement in the time course dataset.


Median is insufficient either to reduce variability or to retain signal effectively for CodeLink Bioarray data. CyclicLoess is a more suitable approach for normalizing these data. CyclicLoess also seems to be the most effective method among the five different normalization strategies examined.