Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Software

Automated group assignment in large phylogenetic trees using GRUNT: GRouping, Ungrouping, Naming Tool

Daniel Dalevi1, Todd Z DeSantis2, Jakob Fredslund3, Gary L Andersen2, Victor M Markowitz1 and Philip Hugenholtz4*

Author Affiliations

1 Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA

2 Center for Environmental Biotechnology, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA

3 Bioinformatics Research Center, University of Aerhus, Høgh-Guldbergs Gade 10, Building 090, DK-8000 Århus C, Denmark

4 Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Dr., Walnut Creek, CA 94598, USA

For all author emails, please log on.

BMC Bioinformatics 2007, 8:402  doi:10.1186/1471-2105-8-402

Published: 18 October 2007

Abstract

Background

Accurate taxonomy is best maintained if species are arranged as hierarchical groups in phylogenetic trees. This is especially important as trees grow larger as a consequence of a rapidly expanding sequence database. Hierarchical group names are typically manually assigned in trees, an approach that becomes unfeasible for very large topologies.

Results

We have developed an automated iterative procedure for delineating stable (monophyletic) hierarchical groups to large (or small) trees and naming those groups according to a set of sequentially applied rules. In addition, we have created an associated ungrouping tool for removing existing groups that do not meet user-defined criteria (such as monophyly). The procedure is implemented in a program called GRUNT (GRouping, Ungrouping, Naming Tool) and has been applied to the current release of the Greengenes (Hugenholtz) 16S rRNA gene taxonomy comprising more than 130,000 taxa.

Conclusion

GRUNT will facilitate researchers requiring comprehensive hierarchical grouping of large tree topologies in, for example, database curation, microarray design and pangenome assignments. The application is available at the greengenes website [1].