Statistical monitoring of weak spots for improvement of normalization and ratio estimates in microarrays
Department of Arthritis and Immunology, Oklahoma Medical Research Foundation, Oklahoma City, OK, USA
BMC Bioinformatics 2004, 5:53 doi:10.1186/1471-2105-5-53Published: 5 May 2004
Several aspects of microarray data analysis are dependent on identification of genes expressed at or near the limits of detection. For example, regression-based normalization methods rely on the premise that most genes in compared samples are expressed at similar levels and therefore require accurate identification of nonexpressed genes (additive noise) so that they can be excluded from the normalization procedure. Moreover, key regulatory genes can maintain stringent control of a given response at low expression levels. If arbitrary cutoffs are used for distinguishing expressed from nonexpressed genes, some of these key regulatory genes may be unnecessarily excluded from the analysis. Unfortunately, no accurate method for differentiating additive noise from genes expressed at low levels is currently available.
We developed a multistep procedure for analysis of mRNA expression data that robustly identifies the additive noise in a microarray experiment. This analysis is predicated on the fact that additive noise signals can be accurately identified by both distribution and statistical analysis.
Identification of additive noise in this manner allows exclusion of noncorrelated weak signals from regression-based normalization of compared profiles thus maximizing the accuracy of these methods. Moreover, genes expressed at very low levels can be clearly identified due to the fact that their expression distribution is stable and distinguishable from the random pattern of additive noise.