Email updates

Keep up to date with the latest news and content from BMC Research Notes and BioMed Central.

Open Access Highly Accessed Data Note

Sharing and re-use of phylogenetic trees (and associated data) to facilitate synthesis

Arlin Stoltzfus1*, Brian O'Meara2, Jamie Whitacre3, Ross Mounce4, Emily L Gillespie5, Sudhir Kumar6, Dan F Rosauer7 and Rutger A Vos8

Author Affiliations

1 Biochemical Science Division, NIST, 100 Bureau Drive, Gaithersburg, MD, USA

2 Department of Ecology & Evolutionary Biology, University of Tennessee, 569 Dabney Hall, Knoxville, TN, 37996-1610, USA

3 NMNH, Smithsonian Institution, Washington, DC, 20013-7012, USA

4 Department of Biology and Biochemistry, University of Bath, Bath, UK

5 Department of Biology, Marshall University, Huntington, WV, USA

6 Center for Evolutionary Medicine and Informatics, Biodesign Institute and School of Life Sciences, Arizona State University, Tempe, AZ, 85287-5301, USA

7 Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, USA

8 NCB Naturalis, Einsteinweg 2, 2333 CC, Leiden, the Netherlands

For all author emails, please log on.

BMC Research Notes 2012, 5:574  doi:10.1186/1756-0500-5-574

Published: 22 October 2012

Abstract

Background

Recently, various evolution-related journals adopted policies to encourage or require archiving of phylogenetic trees and associated data. Such attention to practices that promote sharing of data reflects rapidly improving information technology, and rapidly expanding potential to use this technology to aggregate and link data from previously published research. Nevertheless, little is known about current practices, or best practices, for publishing trees and associated data so as to promote re-use.

Findings

Here we summarize results of an ongoing analysis of current practices for archiving phylogenetic trees and associated data, current practices of re-use, and current barriers to re-use. We find that the technical infrastructure is available to support rudimentary archiving, but the frequency of archiving is low. Currently, most phylogenetic knowledge is not easily re-used due to a lack of archiving, lack of awareness of best practices, and lack of community-wide standards for formatting data, naming entities, and annotating data. Most attempts at data re-use seem to end in disappointment. Nevertheless, we find many positive examples of data re-use, particularly those that involve customized species trees generated by grafting to, and pruning from, a much larger tree.

Conclusions

The technologies and practices that facilitate data re-use can catalyze synthetic and integrative research. However, success will require engagement from various stakeholders including individual scientists who produce or consume shareable data, publishers, policy-makers, technology developers and resource-providers. The critical challenges for facilitating re-use of phylogenetic trees and associated data, we suggest, include: a broader commitment to public archiving; more extensive use of globally meaningful identifiers; development of user-friendly technology for annotating, submitting, searching, and retrieving data and their metadata; and development of a minimum reporting standard (MIAPA) indicating which kinds of data and metadata are most important for a re-useable phylogenetic record.

Keywords:
Evolution; Phylogeny; Data sharing; Bioinformatics; Phyloinformatics; Standards