Open Access Highly Accessed Software

Phylotastic! Making tree-of-life knowledge accessible, reusable and convenient

Arlin Stoltzfus1*, Hilmar Lapp2, Naim Matasci3, Helena Deus4, Brian Sidlauskas5, Christian M Zmasek6, Gaurav Vaidya7, Enrico Pontelli8, Karen Cranston2, Rutger Vos9, Campbell O Webb10, Luke J Harmon11, Megan Pirrung12, Brian O'Meara13, Matthew W Pennell11, Siavash Mirarab14, Michael S Rosenberg15, James P Balhoff2, Holly M Bik16, Tracy A Heath17, Peter E Midford2, Joseph W Brown11, Emily Jane McTavish18, Jeet Sukumaran19, Mark Westneat20, Michael E Alfaro21, Aaron Steele22 and Greg Jordan23

Author Affiliations

1 Institute for Bioscience and Biotechnology Research (IBBR), Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, 20899, USA

2 National Evolutionary Synthesis Center, 2024 W. Main St, Durham, NC, 27705, USA

3 The iPlant Collaborative and EEB Department, University of Arizona, 1657 E Helen St, Tucson, AZ, 85721, USA

4 Digital Enterprise Research Institute, National University of Ireland, University Road, Galway, Ireland

5 Department of Fisheries and Wildlife, Oregon State University, 104 Nash Hall, Corvallis, OR, 97331-3803, USA

6 Sanford-Burnham Medical Research Institute, 10901 North Torrey Pines Road, La Jolla, CA, 92037, USA

7 Department of Ecology and Evolutionary Biology, University of Colorado Boulder, Boulder, CO, 80309-0334, USA

8 Department of Computer Science, New Mexico State University, MSC CS, Box 30001, Las Cruces, NM, 88003, USA

9 NCB Naturalis, Einsteinweg 2, Leiden, 2333 CC, the Netherlands

10 Arnold Arboretum of Harvard University, Boston, MA, 02130, USA

11 Institute for Bioinformatics and Evolutionary Studies (IBEST), University of Idaho, PO Box 443051, Moscow, ID, 83844-3051, USA

12 University of Colorado Denver Anschutz Medical Campus, Aurora, CO, 80045, USA

13 Department of Ecology & Evolutionary Biology, 569 Dabney Hall, University of Tennessee, Knoxville, TN, 37996, USA

14 Department of Computer Science, University of Texas at Austin, Austin, TX, 78701, USA

15 Center for Evolutionary Medicine and Informatics, The Biodesign Institute, and School of Life Sciences, Arizona State University, PO Box 874501, Tempe, AZ, 85287-4501, USA

16 UC Davis Genome Center, One Shields Ave, Davis, CA, 95618, USA

17 Department of Integrative Biology, University of California, Berkeley, CA, 94720-3140, USA

18 University of Texas at Austin, BEACON, Austin, TX, USA

19 Biology Department, Duke University, Biological Sciences Building, 125 Science Drive, Durham, NC, 27708, USA

20 Biodiversity Synthesis Center, Field Museum of Natural History, 1400 S Lakeshore Dr, Chicago, IL, 60605, USA

21 Department of Ecology and Evolutionary Biology, South University of California Los Angeles, 621 Charles E. Young Dr, Los Angeles, CA, 90095, USA

22 U.C. Berkeley Museum of Vertebrate Zoology, University of California, 3101 Valley Life Sciences Building, Berkeley, CA, 94720, USA

23 Paperpile, 34 Houghton Street, Somerville, MA, 02143, USA

For all author emails, please log on.

BMC Bioinformatics 2013, 14:158  doi:10.1186/1471-2105-14-158

Published: 13 May 2013

Abstract

Background

Scientists rarely reuse expert knowledge of phylogeny, in spite of years of effort to assemble a great “Tree of Life” (ToL). A notable exception involves the use of Phylomatic, which provides tools to generate custom phylogenies from a large, pre-computed, expert phylogeny of plant taxa. This suggests great potential for a more generalized system that, starting with a query consisting of a list of any known species, would rectify non-standard names, identify expert phylogenies containing the implicated taxa, prune away unneeded parts, and supply branch lengths and annotations, resulting in a custom phylogeny suited to the user’s needs. Such a system could become a sustainable community resource if implemented as a distributed system of loosely coupled parts that interact through clearly defined interfaces.

Results

With the aim of building such a “phylotastic” system, the NESCent Hackathons, Interoperability, Phylogenies (HIP) working group recruited 2 dozen scientist-programmers to a weeklong programming hackathon in June 2012. During the hackathon (and a three-month follow-up period), 5 teams produced designs, implementations, documentation, presentations, and tests including: (1) a generalized scheme for integrating components; (2) proof-of-concept pruners and controllers; (3) a meta-API for taxonomic name resolution services; (4) a system for storing, finding, and retrieving phylogenies using semantic web technologies for data exchange, storage, and querying; (5) an innovative new service, DateLife.org, which synthesizes pre-computed, time-calibrated phylogenies to assign ages to nodes; and (6) demonstration projects. These outcomes are accessible via a public code repository (GitHub.com), a website (http://www.phylotastic.org webcite), and a server image.

Conclusions

Approximately 9 person-months of effort (centered on a software development hackathon) resulted in the design and implementation of proof-of-concept software for 4 core phylotastic components, 3 controllers, and 3 end-user demonstration tools. While these products have substantial limitations, they suggest considerable potential for a distributed system that makes phylogenetic knowledge readily accessible in computable form. Widespread use of phylotastic systems will create an electronic marketplace for sharing phylogenetic knowledge that will spur innovation in other areas of the ToL enterprise, such as annotation of sources and methods and third-party methods of quality assessment.

Keywords:
Phylogeny; Taxonomy; Hackathon; Web services; Data reuse; Tree of life