Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Highly Accessed Research article

Genome sequencing of bacteria: sequencing, de novo assembly and rapid analysis using open source tools

Veljo Kisand12* and Teresa Lettieri2

Author affiliations

1 Institute of Technology, Tartu University, Nooruse 1, Tartu 50411, Estonia

2 European Commission, Joint Research Centre, Institute for Environment and Sustainability Rural, Water and Ecosystem Resources Unit, TP 270, Via E. Fermi, 2749, Ispra, VA, 21027, Italy

For all author emails, please log on.

Citation and License

BMC Genomics 2013, 14:211  doi:10.1186/1471-2164-14-211

Published: 1 April 2013

Abstract

Background

De novo genome sequencing of previously uncharacterized microorganisms has the potential to open up new frontiers in microbial genomics by providing insight into both functional capabilities and biodiversity. Until recently, Roche 454 pyrosequencing was the NGS method of choice for de novo assembly because it generates hundreds of thousands of long reads (<450 bps), which are presumed to aid in the analysis of uncharacterized genomes. The array of tools for processing NGS data are increasingly free and open source and are often adopted for both their high quality and role in promoting academic freedom.

Results

The error rate of pyrosequencing the Alcanivorax borkumensis genome was such that thousands of insertions and deletions were artificially introduced into the finished genome. Despite a high coverage (~30 fold), it did not allow the reference genome to be fully mapped. Reads from regions with errors had low quality, low coverage, or were missing. The main defect of the reference mapping was the introduction of artificial indels into contigs through lower than 100% consensus and distracting gene calling due to artificial stop codons. No assembler was able to perform de novo assembly comparable to reference mapping. Automated annotation tools performed similarly on reference mapped and de novo draft genomes, and annotated most CDSs in the de novo assembled draft genomes.

Conclusions

Free and open source software (FOSS) tools for assembly and annotation of NGS data are being developed rapidly to provide accurate results with less computational effort. Usability is not high priority and these tools currently do not allow the data to be processed without manual intervention. Despite this, genome assemblers now readily assemble medium short reads into long contigs (>97-98% genome coverage). A notable gap in pyrosequencing technology is the quality of base pair calling and conflicting base pairs between single reads at the same nucleotide position. Regardless, using draft whole genomes that are not finished and remain fragmented into tens of contigs allows one to characterize unknown bacteria with modest effort.

Keywords:
Reference mapping; De novo sequencing; De novo assembly; Automated annotation; Marine bacteria