Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Selected papers from the Seventh Asia-Pacific Bioinformatics Conference (APBC 2009)

Open Access Research

ComPhy: prokaryotic composite distance phylogenies inferred from whole-genome gene sets

Guan Ning Lin1, Zhipeng Cai2, Guohui Lin2, Sounak Chakraborty3 and Dong Xu1*

Author Affiliations

1 Digital Biology Laboratory, Informatics Institute, Computer Science Department and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA

2 Department of Computing Science, University of Alberta, Edmonton, Alberta T6G 2E8, Canada

3 Department of Statistics, University of Missouri, Columbia, MO 65211, USA

For all author emails, please log on.

BMC Bioinformatics 2009, 10(Suppl 1):S5  doi:10.1186/1471-2105-10-S1-S5

Published: 30 January 2009

Abstract

Background

With the increasing availability of whole genome sequences, it is becoming more and more important to use complete genome sequences for inferring species phylogenies. We developed a new tool ComPhy, 'Composite Distance Phylogeny', based on a composite distance matrix calculated from the comparison of complete gene sets between genome pairs to produce a prokaryotic phylogeny.

Results

The composite distance between two genomes is defined by three components: Gene Dispersion Distance (GDD), Genome Breakpoint Distance (GBD) and Gene Content Distance (GCD). GDD quantifies the dispersion of orthologous genes along the genomic coordinates from one genome to another; GBD measures the shared breakpoints between two genomes; GCD measures the level of shared orthologs between two genomes. The phylogenetic tree is constructed from the composite distance matrix using a neighbor joining method. We tested our method on 9 datasets from 398 completely sequenced prokaryotic genomes. We have achieved above 90% agreement in quartet topologies between the tree created by our method and the tree from the Bergey's taxonomy. In comparison to several other phylogenetic analysis methods, our method showed consistently better performance.

Conclusion

ComPhy is a fast and robust tool for genome-wide inference of evolutionary relationship among genomes. It can be downloaded from http://digbio.missouri.edu/ComPhy webcite.