BMC Evolutionary Biology

official impact factor 3.70

Open Access Highly Access Methodology article

Bayesian inference of population size history from multiple loci

Joseph Heled1 and Alexei J Drummond1,2*

Author Affiliations

1 Department of Computer Science, University of Auckland, Auckland, New Zealand

2 Bioinformatics Institute, University of Auckland, Auckland, New Zealand

For all author emails, please log on.

BMC Evolutionary Biology 2008, 8:289 doi:10.1186/1471-2148-8-289

Published: 23 October 2008

Abstract

Background

Effective population size (Ne) is related to genetic variability and is a basic parameter in many models of population genetics. A number of methods for inferring current and past population sizes from genetic data have been developed since JFC Kingman introduced the n-coalescent in 1982. Here we present the Extended Bayesian Skyline Plot, a non-parametric Bayesian Markov chain Monte Carlo algorithm that extends a previous coalescent-based method in several ways, including the ability to analyze multiple loci.

Results

Through extensive simulations we show the accuracy and limitations of inferring population size as a function of the amount of data, including recovering information about evolutionary bottlenecks. We also analyzed two real data sets to demonstrate the behavior of the new method; a single gene Hepatitis C virus data set sampled from Egypt and a 10 locus Drosophila ananassae data set representing 16 different populations.

Conclusion

The results demonstrate the essential role of multiple loci in recovering population size dynamics. Multi-locus data from a small number of individuals can precisely recover past bottlenecks in population size which can not be characterized by analysis of a single locus. We also demonstrate that sequence data quality is important because even moderate levels of sequencing errors result in a considerable decrease in estimation accuracy for realistic levels of population genetic variability.