Open Access Highly Accessed Research article

Differences in sequencing technologies improve the retrieval of anammox bacterial genome from metagenomes

Fabio Gori1*, Susannah G Tringe2, Gianluigi Folino3, Sacha AFT van Hijum4, Huub JM Op den Camp5, Mike SM Jetten56 and Elena Marchiori1

Author Affiliations

1 Radboud University Nijmegen, Institute for Computing and Information Science, Heyendaalseweg 135, 6525 AJ Nijmegen, The Netherlands

2 DOE Joint Genome Institute, Walnut Creek, CA 94598, USA

3 ICAR-CNR, Rende, Italy

4 Center for Molecular and Biomolecular Informatics, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands

5 Department of Microbiology, Institute for Water and Wetland Research, Radboud University Nijmegen, Heyendaalseweg 135, 6525 AJ Nijmegen, The Netherlands

6 Delft University of Technology, Department Biotechnology, 2628 BC Delft, The Netherlands

For all author emails, please log on.

BMC Genomics 2013, 14:7  doi:10.1186/1471-2164-14-7

Published: 16 January 2013

Abstract

Background

Sequencing technologies have different biases, in single-genome sequencing and metagenomic sequencing; these can significantly affect ORFs recovery and the population distribution of a metagenome. In this paper we investigate how well different technologies represent information related to a considered organism of interest in a metagenome, and whether it is beneficial to combine information obtained using different technologies. We analyze comparatively three metagenomic datasets acquired from a sample containing the anammox bacterium Candidatus ’Brocadia fulgida’ (B. fulgida). These datasets were obtained using Roche 454 FLX and Sanger sequencing with two different libraries (shotgun and fosmid).

Results

In each dataset, the abundance of the reads annotated to B. fulgida was much lower than the abundance expected from available cell count information. This was due to the overrepresentation of GC-richer organisms, as shown by GC-content distribution of the reads. Nevertheless, by considering the union of B. fulgida reads over the three datasets, the number of B. fulgida ORFs recovered for at least 80% of their length was twice the amount recovered by the best technology. Indeed, while taxonomic distributions of reads in the three datasets were similar, the respective sets of B. fulgida ORFs recovered for a large part of their length were highly different, and depth of coverage patterns of 454 and Sanger were dissimilar.

Conclusions

Precautions should be sought in order to prevent the overrepresentation of GC-rich microbes in the datasets. This overrepresentation and the consistency of the taxonomic distributions of reads obtained with different sequencing technologies suggests that, in general, abundance biases might be mainly due to other steps of the sequencing protocols. Results show that biases against organisms of interest could be compensated combining different sequencing technologies, due to the differences of their genome-level sequencing biases even if the species was present in not very different abundances in the metagenomes.