Email updates

Keep up to date with the latest news and content from BMC Research Notes and BioMed Central.

Open Access Technical Note

Treetrimmer: a method for phylogenetic dataset size reduction

Shinichiro Maruyama123, Robert JM Eveleigh1234 and John M Archibald123*

Author Affiliations

1 Department of Biochemistry & Molecular Biology, Dalhousie University, Halifax, NS, Canada

2 Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, NS, Canada

3 Integrated Microbial Biodiversity Program, Canadian Institute for Advanced Research, Montreal, QC H3A 1A4, Canada

4 McGill University and Génome Québec, 740 Docteur-Penfield Ave, Montreal, QC H3A 1A4, Canada

For all author emails, please log on.

BMC Research Notes 2013, 6:145  doi:10.1186/1756-0500-6-145

Published: 12 April 2013

Abstract

Background

With rapid advances in genome sequencing and bioinformatics, it is now possible to generate phylogenetic trees containing thousands of operational taxonomic units (OTUs) from a wide range of organisms. However, use of rigorous tree-building methods on such large datasets is prohibitive and manual ‘pruning’ of sequence alignments is time consuming and raises concerns over reproducibility. There is a need for bioinformatic tools with which to objectively carry out such pruning procedures.

Findings

Here we present ‘TreeTrimmer’, a bioinformatics procedure that removes unnecessary redundancy in large phylogenetic datasets, alleviating the size effect on more rigorous downstream analyses. The method identifies and removes user-defined ‘redundant’ sequences, e.g., orthologous sequences from closely related organisms and ‘recently’ evolved lineage-specific paralogs. Representative OTUs are retained for more rigorous re-analysis.

Conclusions

TreeTrimmer reduces the OTU density of phylogenetic trees without sacrificing taxonomic diversity while retaining the original tree topology, thereby speeding up downstream computer-intensive analyses, e.g., Bayesian and maximum likelihood tree reconstructions, in a reproducible fashion.

Keywords:
TreeTrimmer; Phylogenetic tree; Pruning; Dereplication; Taxonomic category