Institute of Ecology and Evolution, University of Bern, 3012 Bern, Switzerland

Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland

Ecole d'ingénieurs de Fribourg, 1705 Fribourg, Switzerland

Department of Ecology and Evolution, University of Lausanne, CH-1015 Lausanne, Switzerland

Abstract

Background

The estimation of demographic parameters from genetic data often requires the computation of likelihoods. However, the likelihood function is computationally intractable for many realistic evolutionary models, and the use of Bayesian inference has therefore been limited to very simple models. The situation changed recently with the advent of Approximate Bayesian Computation (ABC) algorithms allowing one to obtain parameter posterior distributions based on simulations not requiring likelihood computations.

Results

Here we present ABCtoolbox, a series of open source programs to perform Approximate Bayesian Computations (ABC). It implements various ABC algorithms including rejection sampling, MCMC without likelihood, a Particle-based sampler and ABC-GLM. ABCtoolbox is bundled with, but not limited to, a program that allows parameter inference in a population genetics context and the simultaneous use of different types of markers with different ploidy levels. In addition, ABCtoolbox can also interact with most simulation and summary statistics computation programs. The usability of the ABCtoolbox is demonstrated by inferring the evolutionary history of two evolutionary lineages of

Conclusion

ABCtoolbox allows a user to perform all the necessary steps of a full ABC analysis, from parameter sampling from prior distributions, data simulations, computation of summary statistics, estimation of posterior distributions, model choice, validation of the estimation procedure, and visualization of the results.

Background

Bayesian statistics has gained popularity in scientific inference, especially in population genetics and genomics **θ **whose joint prior density is denoted by π (**θ**). The quantity of interest is the posterior distribution of the parameters **θ**, which is given by Bayes theorem as π (**θ**|D)~f(D|**θ**)π (**θ**), where f(D|**θ**) is the likelihood of the data. Unfortunately the evaluation of likelihoods is often difficult or even impossible for complex models. However, Monte-Carlo simulations can be used to approximate the likelihood function. For instance, a simple rejection algorithm has been proposed **θ **is simulated from a prior distribution and accepted if the corresponding vector of summary statistics **S **is sufficiently "close" to the observed summary statistics **S**
_{obs }with respect to some metric in the space of S (i.e., if ||**S **- **S**
_{obs}|| ≤ ε for a fixed tolerance ε). The precision of the posterior estimate will improve with smaller values of ε, but small ε values are often associated with very small acceptance rates (that are proportional to the likelihood) and will thus require many more computations.

More recently, Beaumont

Here, we present ABCtoolbox, a series of computer programs that can be pipelined to estimate parameters of complex models. In complement to available ABC packages

Implementation

An ABC estimation is typically done in two steps: a large number of simulations are first carried out (simulation step) and then used to estimate posterior distributions (estimation step). The package incorporates two main programs for these two steps (Figure

Flowchart describing the individual steps of an ABC estimation by

**Flowchart describing the individual steps of an ABC estimation by ABCtoolbox**. Black arrows indicate the standard approach. Some alternative paths are shown with dotted lines. For instance, it is possible to modify the output of a simulation program such as to allow one to take specific characteristics of the observed data into account such as a given level of missing data. Additionally,

Interaction of

The program

Results and Discussion

We demonstrate the use of

Assumed Model of

Following previous findings _{DIV }generations ago. We describe here the model first for mtDNA. Each lineage is modeled as a large panmictic continent-island model where samples are taken from small islands. The effective size of the _{i }is assumed to be normally distributed with mean _{N}. All remaining (and unsampled) populations are collectively represented by a large panmictic continent (of arbitrary size set to 10^{7 }individuals). As commonly assumed in continent-island models, migration is only allowed from the islands to the continent looking backward in time, and is constant over time and of rate _{
m
}for each island. Under this model, sampled populations do not directly exchange genes, which implies that the most recent common ancestor of any pair of genes drawn from different populations is not found in any of the sampled populations. This seems a very reasonable assumptions given the huge number of

Evolutionary model of the demographic history of two groups of populations corresponding to the Central and Eastern mtDNA evolutionary lineages of the common vole

**Evolutionary model of the demographic history of two groups of populations corresponding to the Central and Eastern mtDNA evolutionary lineages of the common vole Microtus arvalis**. An ancestral population of size

We used SIMCOAL2 to perform genetic simulations and _{
males
}) for microsatellites, where _{
males
}is independently drawn from prior distribution. This parameterization simply indicates that the number of mtDNA genes is equal to the number of diploid females in the population and that mtDNA gene flow only occurs through females. Microsatellite diversity depends here on both male and female individuals, and we assumed that the number of autosomal genes is equal to two times the number of males and females in the population, the number of males potentially differing from that of females by a factor _{DNA }was based on previous estimates _{STR})

Characteristics of the prior and obtained posterior distributions.

**Parameter**

**Prior**

**Mode**

**HPDI50 ^{c}**

**HPDI90 ^{c}**

U [10, 500]

73.89

[32.12, 125.49]

[10.00, 238.53]

σ_{N}

U [10, 200]

166.58

[126.48, 188.54]

[59.65, 200.00]

_{
A
}
^{
a
}

10^{U [3,6.5]}

86,000

[46100, 153400]

[17800, 300000]

^{
a
}

10^{U [-1,1]}

0.25

[0.16, 0.80]

[0.10, 2.80]

^{
a
}

10^{U [-1.5,1]}

0.16

[0.10, 0.23]

[0.05, 0.35]

_{males}
^{
a
}

10^{U [-1.5,1]}

1.11

[0.56, 1.91]

[0.26, 3.92]

_{
DIV
}
^{
b
}

U [40,000, 80,000]

18,600

[16600, 21800]

[16000, 28100]

_{DNA }× 10^{8}

U [10^{-8}, 5*10^{-7}]

8.37

[6.29, 11.14]

[3.19, 14.78]

_{STR }× 10^{4}

U [10^{-5}, 5*10^{-4}]

1.33

[1.09, 1.61]

[0.52, 2.39]

U 8.00 12.00

9.05

[8.60, 10.33]

[8.14, 11.52]

^{a}The posterior distributions of these parameters were estimated on a logarithmic scale. The reported posterior characteristics were then transformed onto the natural scale.

^{b }Whereas the prior of the divergence time _{DIV }is expressed in generations, its posterior is expressed here in years for convenience, assuming 2.5 generations per year

^{c }The Highest Posterior Density Interval HPDI is chosen as the continuous interval of parameter values with highest posterior density.

Estimation Procedure and Validation

We computed a total of 338 summary statistics on the 11 populations and both mtDNA and microsatellite datasets, which were then reduced to 7 PLS components using specific R scripts of ^{6 }steps with tolerance ^{6 }simulations performed under a standard rejection sampling ^{6 }simulations to estimate parameters for all 1000 datasets. We report in Figure

Distributions of the quantiles (

**Distributions of the quantiles ( x-axis) of the known parameter values as inferred from the posterior distributions obtained with ABCestimator for 1000 pseudo-observed data sets**. These distributions are expected to be uniform if posterior densities have appropriate coverage properties

We report in Figure _{
f
}~75), but with a large standard deviation (_{DIV }~ 18,500 years). The strong demographic differences between males and females inferred here suggest that an incomplete picture may arise when studying colonization processes or inferring demographic history based on mtDNA alone.

Posterior distributions obtained with ^{6 }steps performed by

**Posterior distributions obtained with ABCestimator based on a likelihood-free MCMC chain of 10 ^{6 }steps performed by ABCsampler**. Additional characteristics of the posterior distributions, along with the prior distributions, are given in Table 1. See Figure 2 and text for parameter description.

Conclusions

The flexibility of

Availability and requirements

Project name: ABCtoolbox

Project home page:

Operating system(s): Platform independent with supported Linux and Windows binaries

Other requirements: Windows users need to install the CYGWIN Linux-like environment for Windows, available on

Programming language: C++ and R

License: GNU GPL version 3 or later

Authors' contributions

DW, SN and LE designed and implemented ABCsampler. DW and CL designed and implemented ABCestimator. DW implemented all other programs and scripts. DW and LE performed the analysis and wrote the paper. All authors have read and approved the final manuscript.

Acknowledgements

We thank Gerald Heckel for helpful comments on an earlier draft of the manuscript.