Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Methodology article

Base calling for high-throughput short-read sequencing: dynamic programming solutions

Shreepriya Das* and Haris Vikalo

Author Affiliations

Electrical and Computer Engineering Department, The University of Texas at Austin, Austin, Texas 78712, USA

For all author emails, please log on.

BMC Bioinformatics 2013, 14:129  doi:10.1186/1471-2105-14-129

Published: 15 April 2013

Abstract

Background

Next-generation DNA sequencing platforms are capable of generating millions of reads in a matter of days at rapidly reducing costs. Despite its proliferation and technological improvements, the performance of next-generation sequencing remains adversely affected by the imperfections in the underlying biochemical and signal acquisition procedures. To this end, various techniques, including statistical methods, are used to improve read lengths and accuracy of these systems. Development of high performing base calling algorithms that are computationally efficient and scalable is an ongoing challenge.

Results

We develop model-based statistical methods for fast and accurate base calling in Illumina’s next-generation sequencing platforms. In particular, we propose a computationally tractable parametric model which enables dynamic programming formulation of the base calling problem. Forward-backward and soft-output Viterbi algorithms are developed, and their performance and complexity are investigated and compared with the existing state-of-the-art base calling methods for this platform. A C code implementation of our algorithm named Softy can be downloaded from https://sourceforge.net/projects/dynamicprog webcite.

Conclusion

We demonstrate high accuracy and speed of the proposed methods on reads obtained using Illumina’s Genome Analyzer II and HiSeq2000. In addition to performing reliable and fast base calling, the developed algorithms enable incorporation of prior knowledge which can be utilized for parameter estimation and is potentially beneficial in various downstream applications.