Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Open Badges Research article

A human genome-wide library of local phylogeny predictions for whole-genome inference problems

Srinath Sridhar1 and Russell Schwartz2*

Author Affiliations

1 Computer Science Department, Carnegie Mellon University, USA

2 Department of Biological Sciences, Carnegie Mellon University, USA

For all author emails, please log on.

BMC Genomics 2008, 9:389  doi:10.1186/1471-2164-9-389

Published: 18 August 2008



Many common inference problems in computational genetics depend on inferring aspects of the evolutionary history of a data set given a set of observed modern sequences. Detailed predictions of the full phylogenies are therefore of value in improving our ability to make further inferences about population history and sources of genetic variation. Making phylogenetic predictions on the scale needed for whole-genome analysis is, however, extremely computationally demanding.


In order to facilitate phylogeny-based predictions on a genomic scale, we develop a library of maximum parsimony phylogenies within local regions spanning all autosomal human chromosomes based on Haplotype Map variation data. We demonstrate the utility of this library for population genetic inferences by examining a tree statistic we call 'imperfection,' which measures the reuse of variant sites within a phylogeny. This statistic is significantly predictive of recombination rate, shows additional regional and population-specific conservation, and allows us to identify outlier genes likely to have experienced unusual amounts of variation in recent human history.


Recent theoretical advances in algorithms for phylogenetic tree reconstruction have made it possible to perform large-scale inferences of local maximum parsimony phylogenies from single nucleotide polymorphism (SNP) data. As results from the imperfection statistic demonstrate, phylogeny predictions encode substantial information useful for detecting genomic features and population history. This data set should serve as a platform for many kinds of inferences one may wish to make about human population history and genetic variation.