Email updates

Keep up to date with the latest news and content from BMC Research Notes and BioMed Central.

Open Access Technical Note

The Supertree Tool Kit

Katie E Davis1* and Jon Hill2

Author Affiliations

1 Faculty of Biomedical & Life Sciences, Division of Ecology & Evolutionary Biology, Graham Kerr Building, University of Glasgow, Glasgow, G12 8QQ, UK

2 Applied Modelling and Computation Group, Earth Science and Engineering, Imperial College London, London, SW7 2AZ, UK

For all author emails, please log on.

BMC Research Notes 2010, 3:95  doi:10.1186/1756-0500-3-95

Published: 8 April 2010

Abstract

Background

Large phylogenies are crucial for many areas of biological research. One method of creating such large phylogenies is the supertree method, but creating supertrees containing thousands of taxa, and hence providing a comprehensive phylogeny, requires hundred or even thousands of source input trees. Managing and processing these data in a systematic and error-free manner is challenging and will become even more so as supertrees contain ever increasing numbers of taxa. Protocols for processing input source phylogenies have been proposed to ensure data quality, but no robust software implementations of these protocols as yet exist.

Findings

The aim of the Supertree Tool Kit (STK) is to aid in the collection, storage and processing of input source trees for use in supertree analysis. It is therefore invaluable when creating supertrees containing thousands of taxa and hundreds of source trees. The STK is a Perl module with executable scripts to carry out various steps in the processing protocols. In order to aid processing we have added meta-data, via XML, to each tree which contains information such as the bibliographic source information for the tree and how the data were derived, for instance the character data used to carry out the original analysis. These data are essential parts of previously proposed protocols.

Conclusions

The STK is a bioinformatics tool designed to make it easier to process source phylogenies for inclusion in supertree analysis from hundreds or thousands of input source trees, whilst reducing potential errors and enabling easy sharing of such datasets. It has been successfully used to create the largest known supertree to date containing over 5000 taxa from over 700 source phylogenies.