Log on / register
Feedback | Support | My details
Open AccessResearch article

The statistics of identifying differentially expressed genes in Expresso and TM4: a comparison

Allan A Sioson1 email, Shrinivasrao P Mane2 email, Pinghua Li3 email, Wei Sha4 email, Lenwood S Heath1 email, Hans J Bohnert3 email and Ruth Grene2 email

Department of Computer Science, Virginia Tech, Blacksburg, USA

Department of Plant Pathology, Physiology and Weed Science, Virginia Tech, Blacksburg, USA

Department of Plant Biology and Department of Crop Sciences, University of Illinois, Urbana, USA

Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, USA

author email corresponding author email

BMC Bioinformatics 2006, 7:215doi:10.1186/1471-2105-7-215

Published: 20 April 2006

Abstract

Background

Analysis of DNA microarray data takes as input spot intensity measurements from scanner software and returns differential expression of genes between two conditions, together with a statistical significance assessment. This process typically consists of two steps: data normalization and identification of differentially expressed genes through statistical analysis. The Expresso microarray experiment management system implements these steps with a two-stage, log-linear ANOVA mixed model technique, tailored to individual experimental designs. The complement of tools in TM4, on the other hand, is based on a number of preset design choices that limit its flexibility. In the TM4 microarray analysis suite, normalization, filter, and analysis methods form an analysis pipeline. TM4 computes integrated intensity values (IIV) from the average intensities and spot pixel counts returned by the scanner software as input to its normalization steps. By contrast, Expresso can use either IIV data or median intensity values (MIV). Here, we compare Expresso and TM4 analysis of two experiments and assess the results against qRT-PCR data.

Results

The Expresso analysis using MIV data consistently identifies more genes as differentially expressed, when compared to Expresso analysis with IIV data. The typical TM4 normalization and filtering pipeline corrects systematic intensity-specific bias on a per microarray basis. Subsequent statistical analysis with Expresso or a TM4 t-test can effectively identify differentially expressed genes. The best agreement with qRT-PCR data is obtained through the use of Expresso analysis and MIV data.

Conclusion

The results of this research are of practical value to biologists who analyze microarray data sets. The TM4 normalization and filtering pipeline corrects microarray-specific systematic bias and complements the normalization stage in Expresso analysis. The results of Expresso using MIV data have the best agreement with qRT-PCR results. In one experiment, MIV is a better choice than IIV as input to data normalization and statistical analysis methods, as it yields as greater number of statistically significant differentially expressed genes; TM4 does not support the choice of MIV input data. Overall, the more flexible and extensive statistical models of Expresso achieve more accurate analytical results, when judged by the yardstick of qRT-PCR data, in the context of an experimental design of modest complexity.


© 1999-2009 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.