BMC Bioinformatics

official impact factor 3.03

Open Access Research article

Inference on population history and model checking using DNA sequence and microsatellite data with the software DIYABC (v1.0)

Jean-Marie Cornuet1, Virgine Ravigné2 and Arnaud Estoup1*

Author Affiliations

1 INRA, UMR CBGP (INRA/IRD/Cirad/Montpellier SupAgro), Campus international de Baillarguet, CS 30016, F-34988 Montferrier-sur-Lez cedex, France

2 CIRAD, Unité Mixte de Recherche-Biologie et Génétique des Interaction Plante-Parasite, F-34398 Montpellier, France

For all author emails, please log on.

BMC Bioinformatics 2010, 11:401 doi:10.1186/1471-2105-11-401

Published: 28 July 2010

Additional files

Additional file 1:

Pre-evaluation of model-prior combinations: two examples. Pre-evaluation of model-prior combinations: example 1. A single test pseudo-observed data set (10 microsatellite loci) was first simulated under a model of a single population (sample size of 30 diploid individuals) with effective size N = 10,000. Microsatellite loci were assumed to follow a generalized stepwise mutation model (GSM [37]) with a mean mutation rate (mean μ) equal to 5 × 10-4 and a mean parameter of the geometric distribution of the length in number of repeats of mutation events (mean P) equal to 0.22. Each locus was given a possible range of 40 contiguous allelic states and was characterized by individual μloc and Ploc values drawn from Gamma(mean = mean μ and shape = 2) and Gamma(mean = mean P and shape = 2) distributions, respectively [12]. For ABC analysis of the test data set, we used the same population and marker models, and prior distributions of demographic parameters were as followed: Uniform[10; 1000] (figure A) or Uniform[2000; 20000] (figure B) for N, Uniform[10-4; 10-3] and Uniform[0.1; 0.3] for mean μ and mean P, respectively. We choose three summary statistics (s): mean number of alleles, mean expected heterozygosity [38] and mean allele size variance per population. PCA on summary statistics (A and B) and probability (ssimulated < sobserved) for each summary statistics (C) were computed from 10,000 simulations, randomly drawing parameter values from priors. Pre-evaluation of model-prior combinations: example 2. A single pseudo-observed test data set (10 microsatellite loci) was first simulated under a model of two populations (sample size of 30 diploid individuals per population) splitting at time t = 10,000 generations from an ancestral population, without subsequent migration. For all populations the effective size was N = 1,000. For ABC analysis of the test data set, we used the same population and marker models, and prior distributions of demographic parameters were as followed: Uniform[100; 1000] (figure D) or Uniform[2000; 20000] (figure E) for t, and Uniform[100; 2000] for N. The mutation model and priors for microsatellite markers are the same as in example 1. We choose eight summary statistics (s): mean number of alleles, mean expected heterozygosity [38] and mean allele size variance of each population sample, and FST values and genetic distances (δμ)2 between pairs of populations [39,40]. PCA on summary statistics (D and E) and probability (ssimulated < sobserved) for each of the summary statistics (F) were computed from 10,000 simulaxtions, randomly drawing parameter values from priors.

Format: PDF Size: 595KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 2:

Evaluation of the variation of RMAE values expected by chance between different replicates of 500 pseudo-observed data sets. relative median absolute errors (RMAE) were computed for 10 replicates of 500 pseudo-observed data sets simulated under scenario 1. The data sets include 20 (independent) microsatellite loci and were generated under scenario 1 presented in Figure 1. Parameter values were drawn from the same distributions than the prior distributions given in the legend of Figure 1. The demographic parameters N, t1, t2, t3, t4, t5, r1 and r2 are detailed in Figure 1. Standard deviation of RMAE values were equal to 0.009, 0.019, 0.004, 0.017, 0.012, 0.013 and 0.014 for N, t1, t2, t3, t4, t5, r1 and r2, respectively. Similar levels of RMAE variation among replicates of 500 pseudo-observed data sets were obtained for other categories of genetic markers (mtDNA and nuclear sequences) and combinations of categories of markers (results not shown).

Format: PDF Size: 17KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 3:

Principal component analysis of test quantities when processing model checking for the introduction scenarios 1, 2 and 3. The scenarios 1, 2 and 3 are detailed in Figure 2. The pseudo-observed test data set analyzed here was simulated under scenario 3. PCA were processed on the test quantities corresponding to the summary statistics used to discriminate among scenarios and compute the posterior distributions of parameters (a) or on other statistics (b). The summary statistics used as test quantities are detailed in the legend of Table 1.

Format: PDF Size: 3.6MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data