Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Methodology article

Bayesian model accounting for within-class biological variability in Serial Analysis of Gene Expression (SAGE)

Ricardo ZN Vêncio12*, Helena Brentani34, Diogo FC Patrão3 and Carlos AB Pereira12

Author Affiliations

1 Statistics Department, Instituto de Matemática e Estatística – Universidade de São Paulo, Rua do Matão 1010, 05508-090 São Paulo, BRAZIL

2 BIOINFO-USP – Núcleo de Pesquisas em Bioinformática da Universidade de São Paulo, Rua do Matão 1010, 05508-090 São Paulo, BRAZIL

3 Ludwig Institute for Cancer Research – São Paulo Branch, Rua Prof. Antônio Prudente 109, 01519-010 São Paulo, BRAZIL

4 Hospital do Câncer A.C. Camargo, Rua Prof. Antônio Prudente 109, 01519-010 São Paulo, BRAZIL

For all author emails, please log on.

BMC Bioinformatics 2004, 5:119  doi:10.1186/1471-2105-5-119

Published: 31 August 2004

Abstract

Background

An important challenge for transcript counting methods such as Serial Analysis of Gene Expression (SAGE), "Digital Northern" or Massively Parallel Signature Sequencing (MPSS), is to carry out statistical analyses that account for the within-class variability, i.e., variability due to the intrinsic biological differences among sampled individuals of the same class, and not only variability due to technical sampling error.

Results

We introduce a Bayesian model that accounts for the within-class variability by means of mixture distribution. We show that the previously available approaches of aggregation in pools ("pseudo-libraries") and the Beta-Binomial model, are particular cases of the mixture model. We illustrate our method with a brain tumor vs. normal comparison using SAGE data from public databases. We show examples of tags regarded as differentially expressed with high significance if the within-class variability is ignored, but clearly not so significant if one accounts for it.

Conclusion

Using available information about biological replicates, one can transform a list of candidate transcripts showing differential expression to a more reliable one. Our method is freely available, under GPL/GNU copyleft, through a user friendly web-based on-line tool or as R language scripts at supplemental web-site.