Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Methodology article

Identifying differential expression in multiple SAGE libraries: an overdispersed log-linear model approach

Jun Lu, John K Tomfohr and Thomas B Kepler*

Author Affiliations

Department of Biostatistics & Bioinformatics, Duke University, Durham, North Carolina 27708, USA

For all author emails, please log on.

BMC Bioinformatics 2005, 6:165  doi:10.1186/1471-2105-6-165

Published: 29 June 2005



In testing for differential gene expression involving multiple serial analysis of gene expression (SAGE) libraries, it is critical to account for both between and within library variation. Several methods have been proposed, including the t test, tw test, and an overdispersed logistic regression approach. The merits of these tests, however, have not been fully evaluated. Questions still remain on whether further improvements can be made.


In this article, we introduce an overdispersed log-linear model approach to analyzing SAGE; we evaluate and compare its performance with three other tests: the two-sample t test, tw test and another based on overdispersed logistic linear regression. Analysis of simulated and real datasets show that both the log-linear and logistic overdispersion methods generally perform better than the t and tw tests; the log-linear method is further found to have better performance than the logistic method, showing equal or higher statistical power over a range of parameter values and with different data distributions.


Overdispersed log-linear models provide an attractive and reliable framework for analyzing SAGE experiments involving multiple libraries. For convenience, the implementation of this method is available through a user-friendly web-interface available at webcite.