Log on / register
Feedback | Support | My details
Open AccessResearch article

Massively parallel tag sequencing reveals the complexity of anaerobic marine protistan communities

Thorsten Stoeck1 email, Anke Behnke1 email, Richard Christen2 email, Linda Amaral-Zettler3 email, Maria J Rodriguez-Mora4 email, Andrei Chistoserdov4 email, William Orsi5 email and Virginia P Edgcomb6 email

1Department of Ecology, University of Kaiserslautern, Kaiserslautern, Germany

2Université de Nice et CNRS UMR 6543, Laboratoire de Biologie Virtuelle, Centre de Biochimie, Parc Valose. F 06108 Nice, France

3Josephine Bay Paul Center, Marine Biological Laboratory, Woods Hole, MA, USA

4University of Louisiana at Lafayette, Lafayette, LA, USA

5Northeastern University, Boston, MA, USA

6Woods Hole Oceanographic Institution, Woods Hole, MA, USA

author email corresponding author email

BMC Biology 2009, 7:72doi:10.1186/1741-7007-7-72

Published: 3 November 2009

Abstract

Background

Recent advances in sequencing strategies make possible unprecedented depth and scale of sampling for molecular detection of microbial diversity. Two major paradigm-shifting discoveries include the detection of bacterial diversity that is one to two orders of magnitude greater than previous estimates, and the discovery of an exciting 'rare biosphere' of molecular signatures ('species') of poorly understood ecological significance. We applied a high-throughput parallel tag sequencing (454 sequencing) protocol adopted for eukaryotes to investigate protistan community complexity in two contrasting anoxic marine ecosystems (Framvaren Fjord, Norway; Cariaco deep-sea basin, Venezuela). Both sampling sites have previously been scrutinized for protistan diversity by traditional clone library construction and Sanger sequencing. By comparing these clone library data with 454 amplicon library data, we assess the efficiency of high-throughput tag sequencing strategies. We here present a novel, highly conservative bioinformatic analysis pipeline for the processing of large tag sequence data sets.

Results

The analyses of ca. 250,000 sequence reads revealed that the number of detected Operational Taxonomic Units (OTUs) far exceeded previous richness estimates from the same sites based on clone libraries and Sanger sequencing. More than 90% of this diversity was represented by OTUs with less than 10 sequence tags. We detected a substantial number of taxonomic groups like Apusozoa, Chrysomerophytes, Centroheliozoa, Eustigmatophytes, hyphochytriomycetes, Ichthyosporea, Oikomonads, Phaeothamniophytes, and rhodophytes which remained undetected by previous clone library-based diversity surveys of the sampling sites. The most important innovations in our newly developed bioinformatics pipeline employ (i) BLASTN with query parameters adjusted for highly variable domains and a complete database of public ribosomal RNA (rRNA) gene sequences for taxonomic assignments of tags; (ii) a clustering of tags at k differences (Levenshtein distance) with a newly developed algorithm enabling very fast OTU clustering for large tag sequence data sets; and (iii) a novel parsing procedure to combine the data from individual analyses.

Conclusion

Our data highlight the magnitude of the under-sampled 'protistan gap' in the eukaryotic tree of life. This study illustrates that our current understanding of the ecological complexity of protist communities, and of the global species richness and genome diversity of protists, is severely limited. Even though 454 pyrosequencing is not a panacea, it allows for more comprehensive insights into the diversity of protistan communities, and combined with appropriate statistical tools, enables improved ecological interpretations of the data and projections of global diversity.


© 1999-2009 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.