BMC Bioinformatics

official impact factor 3.03

Open Access Highly Access Software

PhyloPattern: regular expressions to identify complex patterns in phylogenetic trees

Philippe Gouret1*, Julie D Thompson2 and Pierre Pontarotti1

Author Affiliations

1 UMR 6632, Evolutionary Biology and Modeling, University of Provence, 3 place Victor Hugo, 13331 Marseille, France

2 IGBMC, (CNRS/INSERM/ULP), Biology and Structural Genomics Department, BP 10142, 67404 Illkirch Cedex, France

For all author emails, please log on.

BMC Bioinformatics 2009, 10:298 doi:10.1186/1471-2105-10-298

Published: 19 September 2009

Abstract

Background

To effectively apply evolutionary concepts in genome-scale studies, large numbers of phylogenetic trees have to be automatically analysed, at a level approaching human expertise. Complex architectures must be recognized within the trees, so that associated information can be extracted.

Results

Here, we present a new software library, PhyloPattern, for automating tree manipulations and analysis. PhyloPattern includes three main modules, which address essential tasks in high-throughput phylogenetic tree analysis: node annotation, pattern matching, and tree comparison. PhyloPattern thus allows the programmer to focus on: i) the use of predefined or user defined annotation functions to perform immediate or deferred evaluation of node properties, ii) the search for user-defined patterns in large phylogenetic trees, iii) the pairwise comparison of trees by dynamically generating patterns from one tree and applying them to the other.

Conclusion

PhyloPattern greatly simplifies and accelerates the work of the computer scientist in the evolutionary biology field. The library has been used to automatically identify phylogenetic evidence for domain shuffling or gene loss events in the evolutionary histories of protein sequences. However any workflow that relies on phylogenetic tree analysis, could be automated with PhyloPattern.