Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Proceedings of the Third Annual RECOMB Satellite Workshop on Massively Parallel Sequencing (RECOMB-seq 2013)

Open Access Proceedings

Optimal assembly for high throughput shotgun sequencing

Guy Bresler*, Ma'ayan Bresler and David Tse

Author Affiliations

Department of EECS, UC Berkeley, California, USA

For all author emails, please log on.

BMC Bioinformatics 2013, 14(Suppl 5):S18  doi:10.1186/1471-2105-14-S5-S18

Published: 9 July 2013

Abstract

We present a framework for the design of optimal assembly algorithms for shotgun sequencing under the criterion of complete reconstruction. We derive a lower bound on the read length and the coverage depth required for reconstruction in terms of the repeat statistics of the genome. Building on earlier works, we design a de Brujin graph based assembly algorithm which can achieve very close to the lower bound for repeat statistics of a wide range of sequenced genomes, including the GAGE datasets. The results are based on a set of necessary and sufficient conditions on the DNA sequence and the reads for reconstruction. The conditions can be viewed as the shotgun sequencing analogue of Ukkonen-Pevzner's necessary and sufficient conditions for Sequencing by Hybridization.