Open Access Methodology article

Noise filtering and nonparametric analysis of microarray data underscores discriminating markers of oral, prostate, lung, ovarian and breast cancer

Virginie M Aris123, Michael J Cody1, Jeff Cheng12, James J Dermody3, Patricia Soteropoulos13, Michael Recce12* and Peter P Tolias134

Author Affiliations

1 Center for Applied Genomics, Public Health Research Institute, Newark, NJ 07103, USA

2 Center for Computational Biology, New Jersey Institute of Technology, Newark, NJ 07103, USA

3 Dept of Microbiology and Molecular Genetics, UMDNJ-New Jersey Medical School, Newark, NJ 07103, USA

4 Current address: Ortho-Clinical Diagnostics a Johnson & Johnson Company, Raritan, NJ 08869, USA

For all author emails, please log on.

BMC Bioinformatics 2004, 5:185  doi:10.1186/1471-2105-5-185

Published: 29 November 2004



A major goal of cancer research is to identify discrete biomarkers that specifically characterize a given malignancy. These markers are useful in diagnosis, may identify potential targets for drug development, and can aid in evaluating treatment efficacy and predicting patient outcome. Microarray technology has enabled marker discovery from human cells by permitting measurement of steady-state mRNA levels derived from thousands of genes. However many challenging and unresolved issues regarding the acquisition and analysis of microarray data remain, such as accounting for both experimental and biological noise, transcripts whose expression profiles are not normally distributed, guidelines for statistical assessment of false positive/negative rates and comparing data derived from different research groups. This study addresses these issues using Affymetrix HG-U95A and HG-U133 GeneChip data derived from different research groups.


We present here a simple non parametric approach coupled with noise filtering to identify sets of genes differentially expressed between the normal and cancer states in oral, breast, lung, prostate and ovarian tumors. An important feature of this study is the ability to integrate data from different laboratories, improving the analytical power of the individual results. One of the most interesting findings is the down regulation of genes involved in tissue differentiation.


This study presents the development and application of a noise model that suppresses noise, limits false positives in the results, and allows integration of results from individual studies derived from different research groups.