Open Access Methodology article

PyroTRF-ID: a novel bioinformatics methodology for the affiliation of terminal-restriction fragments using 16S rRNA gene pyrosequencing data

David G Weissbrodt1, Noam Shani1, Lucas Sinclair235, Grégory Lefebvre236, Pierre Rossi4, Julien Maillard1, Jacques Rougemont23 and Christof Holliger1*

Author Affiliations

1 Ecole Polytechnique Fédérale de Lausanne, School of Architecture, Civil and Environmental Engineering, Laboratory for Environmental Biotechnology, Station 6, Lausanne, 1015, Switzerland

2 Ecole Polytechnique Fédérale de Lausanne, School of Life Sciences, Bioinformatics and Biostatistics Core Facility, Lausanne, Switzerland

3 Swiss Institute of Bioinformatics, Lausanne, Switzerland

4 Ecole Polytechnique Fédérale de Lausanne, School of Architecture, Civil and Environmental Engineering, Central Environmental Molecular Biology Laboratory, Lausanne, Switzerland

5 Uppsala University, Limnology Department, Evolutionary Biology Centre, Uppsala, Sweden

6 Nestlé Institute of Health Sciences, Lausanne, Switzerland

For all author emails, please log on.

BMC Microbiology 2012, 12:306  doi:10.1186/1471-2180-12-306

Published: 27 December 2012

Additional files

Additional file 1:

Quality plots generated for samples pyrosequenced with LowRA (>3′000 reads) and HighRA methods (>10′000 reads). Sequence quality PHRED scores over all bases (A): PHRED scores are defined as the logarithm of the base-calling error probability Perror = 10-PHRED/10 and PHRED = −10 log Perror. Box plots represent the distribution of reads quality at each sequence length. The black curve represents the mean sequence quality in function of the sequence length. Distribution of the mean sequence quality PHRED score over the pyrosequencing reads (B). Distribution of sequence lengths over all pyrosequencing reads (C). Only sequences between 300 and 500 bp were kept for dT-RFLP analysis.

Format: PDF Size: 163KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 2:

Assessment of mapping performances with pyrosequencing datasets denoised without (0–500 bp) and with (300–500 bp) minimal read length cutoff. Examples are given for the groundwater sample GRW01, the flocculent activated sludge sample FLS01 and the aerobic granular sludge sample AGS01. After denoising with the one or the other method, each dataset was mapped against a reference database with MG-RAST [66]. No cutoff was set for e-value, minimum identity and minimum alignment length. After having observed that between 35-45% of the sequences were unassigned with Greengenes, RDP – the Ribosomal Database Project [67] was used as reference database for this assessment (only 4% unassigned sequences). Correlations between bacterial community profiles obtained with both denoising methods and both reference databases were analyzed with STAMP [68].

Format: PDF Size: 375KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 3:

Comparison of the distributions of the SW mapping score and of the traditional identity score used by microbial ecologists in the field of environmental sciences for phylogenetic affiliation of sequences. The distributions of the absolute SW score (A) and of the SW scores normalized by the read lengths (B) obtained after mapping of 15 pyrosequencing datasets with the BWA-SW algorithm implemented in the PyroTRF-ID methodology are compared to the distribution of the identity score obtained after annotation of 10 pyrosequencing datasets with MG-RAST [66](C). Greengenes was used as annotation source in all cases. The obtained distributions are characterized by median (m), average (avg) and standard deviation values (s).

Format: PDF Size: 43KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 4:

Full digital T-RFLP profiles. Examples of full digital T-RFLP profiles obtained with the restriction enzymes HaeIII and MspI for the samples GRW01 (A) and AGS01 (B).

Format: PDF Size: 102KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 5:

Comparison of mirror plots obtained on raw (left) and on denoised (right) pyrosequencing datasets. Examples are given for the sample GRW01 pyrosequenced with the HighRA method (A) and for the samples GRW07 (B) and AGS01 (C) pyrosequenced with the LowRA method.

Format: PDF Size: 273KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 6:

Assessment of cross-correlation and optimal lag between denoised dT-RFLP and eT-RFLP profiles. The denoised dT-RFLP profiles of the samples AGS07 (A) and GRW04 (B) were both shifted with optimal lags of −5 bp to match with the related eT-RFLP profiles. At these optimal lags, the maximum cross-correlation coefficients amounted to 0.91 (AGS07) and 0.71 (GRW04).

Format: PDF Size: 44KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 7:

Alignment of sequences mapping with the same reference sequence with identical accession number in the Greengenes database, and resulting in different digital T-RFs. Examples are given for the Rhodocyclus tenuis affiliates (accession number AB200295) of sample AGS01 and for Dehalococcoides relatives (accession number EF059529) of sample GRW05.

Format: PDF Size: 57KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data