Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Selected articles from The 8th Annual Biotechnology and Bioinformatics Symposium (BIOT-2011)

Open Access Research

BM-BC: a Bayesian method of base calling for Solexa sequence data

Yuan Ji1*, Riten Mitra2*, Fernando Quintana3, Alejandro Jara3, Peter Mueller4, Ping Liu5, Yue Lu6 and Shoudan Liang7

Author affiliations

1 Center for Clinical and Research Informatics, Northshore University HealthSystem, Evanston, IL 60091, USA

2 ICES, University of Texas at Austin, Austin, TX 78705, USA

3 Department of Statistics, Pontificia Universidad Católica de Chile, Casilla 306, Correo 22, Santiago, Chile

4 Department of Mathematics, The University of Texas at Austin, Austin, TX 78705, USA

5 Abbott Molecular Inc., Des Plaines, IL 60018, USA

6 Department of Leukamia, The University of Texas, M. D. Anderson Cancer Center, Houston, TX 77030, USA

7 Department of Bioinformatics & Computational Biology, The University of Texas, M. D. Anderson Cancer Center, Houston, TX 77030, USA

For all author emails, please log on.

Citation and License

BMC Bioinformatics 2012, 13(Suppl 13):S6  doi:10.1186/1471-2105-13-S13-S6

Published: 24 August 2012

Abstract

Base calling is a critical step in the Solexa next-generation sequencing procedure. It compares the position-specific intensity measurements that reflect the signal strength of four possible bases (A, C, G, T) at each genomic position, and outputs estimates of the true sequences for short reads of DNA or RNA. We present a Bayesian method of base calling, BM-BC, for Solexa-GA sequencing data. The Bayesian method builds on a hierarchical model that accounts for three sources of noise in the data, which are known to affect the accuracy of the base calls: fading, phasing, and cross-talk between channels. We show that the new method improves the precision of base calling compared with currently leading methods. Furthermore, the proposed method provides a probability score that measures the confidence of each base call. This probability score can be used to estimate the false discovery rate of the base calling or to rank the precision of the estimated DNA sequences, which in turn can be useful for downstream analysis such as sequence alignment.