Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Research article

An algorithm for automated closure during assembly

Sergey Koren*, Jason R Miller, Brian P Walenz and Granger Sutton

Author Affiliations

The J. Craig Venter Institute, 9712 Medical Center Drive, Rockville MD 20850, USA

For all author emails, please log on.

BMC Bioinformatics 2010, 11:457  doi:10.1186/1471-2105-11-457

Published: 10 September 2010

Abstract

Background

Finishing is the process of improving the quality and utility of draft genome sequences generated by shotgun sequencing and computational assembly. Finishing can involve targeted sequencing. Finishing reads may be incorporated by manual or automated means. One automated method uses targeted addition by local re-assembly of gap regions. An obvious alternative uses de novo assembly of all the reads.

Results

A procedure called the bounding read algorithm was developed for assembly of shotgun reads plus finishing reads and their constraints, targeting repeat regions. The algorithm was implemented within the Celera Assembler software and its pyrosequencing-specific variant, CABOG. The implementation was tested on Sanger and pyrosequencing data from six genomes. The bounding read assemblies were compared to assemblies from two other methods on the same data. The algorithm generates improved assemblies of repeat regions, closing and tiling some gaps while degrading none.

Conclusions

The algorithm is useful for small-genome automated finishing projects. Our implementation is available as open-source from http://wgs-assembler.sourceforge.net webcite under the GNU Public License.