This article is part of the supplement: Highlights from the Fifth International Society for Computational Biology (ISCB) Student Council Symposium
Paired-end read length lower bounds for genome re-sequencing
BMC Bioinformatics 2009, 10(Suppl 13):O2 doi:10.1186/1471-2105-10-S13-O2
Published: 19 October 2009First paragraph (this article has no abstract)
Next-generation sequencing technology is enabling massive production of high-quality paired-end reads. Many platforms (Illumina Genome Analyzer, Applied Biosystems SOLID, Helicos HeliScope) are currently able to produce "ultra-short" paired reads of lengths starting at 25 nt. An analysis by Whiteford et al. [1] on sequencing using unpaired reads shows that ultra-short reads theoretically allow whole genome re-sequencing and de novo assembly of only small eukaryotic genomes. Chaisson, Brinza and Pevzner [2] recently determined that the paired read length threshold for de novo assembly of the E. coli genome is ≈ 35 nt, and ≈ 60 nt for the S. cerevisiae genome. The latter read length is unfeasible for some next-generation technologies. By conducting an analysis extending Whiteford et al. results, we investigate to what extent genome re-sequencing is feasible with ultra-short paired reads. We obtain theoretical read length lower bounds for re-sequencing that are also applicable to paired-end de novo assembly.