Institute of Fundamental Sciences, Massey University, Private Bag 11 222, Palmerston North, 4442, New Zealand

School of Mathematics and Physics, University of Tasmania, Hobart, Australia

Abstract

Background

Recombination rates vary at the level of the species, population and individual. Now recognized as a transient feature of the genome, recombination rates at a given locus can change markedly over time. Existing inferential methods, predominantly based on linkage disequilibrium patterns, return a long-term average estimate of past recombination rates. Such estimates can be misleading, but no analytical framework to infer recombination rates that have changed over time is currently available.

Results

We apply coalescent modeling in conjunction with a suite of summary statistics to show that the recombination history of a locus can be reconstructed from a time series of genetic samples. More usefully, we describe a new method, based on

Conclusions

While providing an important stepping-stone to determining past recombination rates,

Background

Meiotic recombination, whereby DNA variants are shuffled between homologous parental chromosomes, is a fundamental process in the evolution of genetic diversity. For many years poorly studied, the mechanisms and effects of recombination are now increasingly well understood

Recent studies have demonstrated convincingly that recombination rates at a given locus vary at the level of the species, population and individual. Comparisons between the chimpanzee and human genomes show poor correlation of both hotspot and background recombination rates at orthologous loci

How changes in recombination rate are controlled is less well understood

Recombination is typically detected either directly by gamete typing, or indirectly from linkage disequilibrium (LD) patterns

The main point is that recombination rates at a genomic location can vary substantially through time. Although this fact is now widely appreciated

Results

Correlation and sensitivity of summary statistics

We first explored how different summary statistics respond to recombination events. The number of segregating sites _{
min
}, _{
nS
}. These summaries likely recognize different aspects of recombination, although the relationships between them have not been explored. Certainly none of these summaries capture the entire recombination profile of a genetic sample (i.e., they are not statistically sufficient).

We studied the correlation matrix between summary statistics using an equal mix of datasets with linearly increasing, decreasing and constant recombination rates. _{
min
}, which showed little variation among datasets under the conditions modeled here. None of the summary statistics were perfectly correlated, thus emphasizing that multiple summaries are needed to capture different aspects of the recombination profile.

Correlations between recombination summary statistics

**Correlations between recombination summary statistics.** (Upper diagonal) Scatter plots show pairwise relationships among the summary statistics. (Lower diagonal) Pie charts show the magnitude of the correlation with blue and red indicating positive and negative values (e.g., Pearson’s _{nS}). All non-zero correlations are statistically significant (

To determine how these summaries respond to different recombination rates, we simulated genetic data under a wide range of constant recombination values (0–10

Sensitivity of summary statistics to different constant recombination rates

**Sensitivity of summary statistics to different constant recombination rates.** Black lines show the mean (solid) and 95% confidence intervals (dotted) of summary statistic values. The red line indicates different constant recombination rates (

Tracking changing recombination rates using time series data

It is less obvious how summary statistics might covary with recombination rates that change over time. To explore this process, we generated coalescent simulations where recombination rates were allowed to vary over many generations. Genetic datasets were simulated using coalescent software ^{4} replicates of 10-kb autosomal sequences were drawn from a constant sized population (_{
e
} = 10^{4}) ^{-8} events/bp/generation ^{4} generations (cf.

A representative example illustrating a logistic decline in recombination rates towards the present is presented in Figure

Response of summary statistics to recombination rates changing logistically over time

**Response of summary statistics to recombination rates changing logistically over time.** Black lines show mean (solid) and 95% confidence intervals (dotted) of summary statistic values. The red line indicates how the recombination rate changes over time (

Response of summary statistics to constant recombination rates.

Click here for file

Response of summary statistics to recombination rates decreasing linearly.

Click here for file

Response of summary statistics to recombination rates increasing linearly.

Click here for file

Response of summary statistics to recombination rates decreasing exponentially.

Click here for file

Response of summary statistics to recombination rates increasing exponentially.

Click here for file

Response of summary statistics to recombination rates decreasing logistically.

Click here for file

Response of summary statistics to recombination rates increasing logistically. (PDF 279 kb)

Click here for file

Most of the summary statistics tracked the changing recombination profile, albeit with notable differences in accuracy. The variance of many summaries altered with the recombination rate, thus suggesting that different summaries have greatest power to estimate recombination rates at different times. This reinforces the view that using a combination of summary statistics should maximize statistical power, although a simple linear combination may not necessarily be optimal.

Note too that summary values typically lagged changes in the recombination rate. Genetic variation observed in the present was actually laid down in the (sometimes very distant) past

Reconstructing past recombination rates from data taken at a single time point

Tracking variable recombination rates using time series data may be feasible for some fast evolving systems (e.g., exploring the loss of sexual competency in yeast), but it is not practical for long-lived organisms like humans. To explore whether past recombination rates can be reconstructed from genetic data taken at a single time point, we developed a novel bootstrapping methodology that we call

Mutations occur randomly through time. In any given dataset, some polymorphisms will be old and most modern lineages will carry them. Others will be young, and will therefore be found in only one or two individuals. By determining whether recombination events affect young or old polymorphisms, we can theoretically obtain snapshots of recombination rates through time.

This concept is best shown graphically (Figure

Variable ages of quartets

**Variable ages of quartets.** Randomly selected quartets (black lines) capture information about (**A**) young, (**B**) medium and (**C**) old time depths. For visual clarity, quartets are shown on a non-recombining genealogy, but the principle holds equally for ancestral recombination graphs.

The use of resampling methods, such as the bootstrap and jackknife

To ascertain whether ^{4} datasets each) (Additional file ^{3} quartets were generated for each dataset, and the suite of summary statistics was calculated for each subsample. The mean, variance and maximum of these summary statistic distributions were recorded.

Although powerful Bayesian and maximum likelihood methods have been developed to perform inference on such datasets

LDA was performed on all datasets from all three recombination models. Each dataset was sequentially excluded, the optimal transform inferred by LDA was applied, and each dataset was reassigned back to a recombination model. As we have three models, assignment rates of one-third are expected just by chance. Assignment rates approaching one indicate increasingly accurate assignments.

Table

**
Mean
**

**
Variance
**

**
Maximum
**

**
Combined
**

Proportions of datasets assigned correctly to constant, linearly increasing and linearly decreasing recombination models using jackknife cross-validation. Values of one-third indicate assignments no better than chance; values approaching one indicate improving assignment rates. 10^{4} datasets consisting of 10-kb of sequence for 100 individuals were generated under each model. Assignments were made using the mean, variance and maximum value of summary statistics for 10^{3} quartets for each dataset.

**
S
**

0.36

0.32

0.36

**
R
**

0.43

0.45

0.38

**
rmmg
**

0.43

0.43

0.45

**nHaps**

0.60

0.59

0.33

**HapDiv**

0.59

0.57

0.33

**Wall’s ****
B
**

0.41

0.47

0.34

**Wall’s ****
Q
**

0.40

0.37

0.33

**Hudson’s ****
C
**

0.43

0.42

0.33

**
Z
**

0.48

0.34

0.48

**All unscaled**

0.66

0.65

0.52

0.68

**
S
**

0.36

0.36

0.34

**
S
**

0.43

0.41

0.45

**
S
**

0.32

0.36

0.36

**
S
**

0.32

0.36

0.36

**
S
**

0.37

0.33

0.31

**
S
**

0.37

0.34

0.32

**
S
**

0.42

0.37

0.37

**
S
**

0.42

0.34

0.43

**All scaled**

0.64

0.59

0.53

0.67

**All combined**

0.71

Scaling subsamples by

These assignments were obtained using information about the amount of recombination in each

**Determining the optimal scaling factor to capture ****
n
**-tuple age.

Click here for file

Correlations between recombination summary statistics scaled by

Click here for file

Coalescent theory tells us that the power to detect recombination events should decline exponentially into the past (see details later). Therefore, the linearly increasing and decreasing models are mostly dominated by low and high recombination rates, respectively, while the constant model is intermediate. We were concerned that our cross-validation test might simply be detecting low, medium and high recombination rates rather than distinguishing constant recombination from recombination rates that change through time. We therefore repeated the cross-validation test with four recombination models: constant high, constant low, linearly increasing and linearly decreasing recombination rates. Assignment accuracy was only slightly lower than for the three-model test (64% vs 71%). We conclude that

Effect of

Thus far,

Effect of subsample (

**Effect of subsample (****tuple) size on assignment accuracy.** Subsample sizes range from 4 to the sample size (

As before, assignment rates started at 71% for quartets, initially improved with increasing

Assignment accuracy was maximized at 84% across all analyses performed here. Although considerably better than chance, the error rate is still moderate. Because power levels are relatively modest, reconstructing historic recombination rates for real genomic loci is expected to remain difficult even when

Discussion

We show that information about past changes in recombination rate can be extracted from genomic data using a suite of summary statistics coupled with lineage subsampling to provide proxy information about recombination events at different time depths. Simulated datasets can be correctly assigned to different models of historic recombination with high accuracy (84%).

Why is the power of

Relationship between time and number of lineages under the coalescent

**Relationship between time and number of lineages under the coalescent.** (**A**) Expected coalescent times for 2–5 lineages in units of **B**) Representative coalescent genealogy. Note that many lineages exist to record events in the recent past, while few lineages remain to represent older time points. Only recombination involving an extant lineage (shaded points) can be observed today. The probability that recombination involves an extant lineage is high in the present, where many shaded points exist, but declines exponentially into the past, where shaded points are scarce.

Put more formally, the coalescent times _{
i
} of

In the sampling limit _{
2
}) takes, on average, half the time to the most recent common ancestor of the sampled dataset (Figure

Further, there is a high probability of observing these two deepest branches, even with very small subsample sizes

Given four randomly chosen subsamples (i.e., a quartet),

Conclusions

A natural limit places important constraints on our ability to reconstruct past changes in recombination rates. If the change occurred recently, sufficient extant lineages may still record the event, and

Methods

Simulations

The coalescent simulation software

Genetic datasets were simulated using Kingman’s _{
e
} = 10^{4}) (i.e., the estimated global effective population size of modern humans) ^{-8} events/bp/generation ^{4} times for each model.

The recombination rate was either held constant, or allowed to vary linearly, exponentially or logistically through time for 10^{4} generations (cf.^{-4}, and logistic rates were fitted to a curve with ^{-4}. (Note that these curves are for exploratory purposes only. They are not intended to represent real rates of change in human populations). The total amount of recombination was constrained so as to be identical for all models, but was apportioned through time according to the constant, linear, exponential and logistic distributions described above. Overall population recombination rates (i.e., _{
e
}

To infer past recombination rates, samples were taken at a single time point and surveyed using ^{4} generations by taking 10^{4} independent coalescent simulations at each of 500 20-generation intervals.

Summary statistics

Summary statistics were calculated using functions from the libsequence library _{
e
}
_{
min
}, the minimum number of recombination events calculated from observed four-gamete violations _{
min
} proposed by Myers and Griffiths (equation four in _{
e
}
_{
nS
}, the mean pairwise ^{
2
} estimate of linkage disequilibrium across all polymorphic sites

Statistics

Correlations between scaled and unscaled summary statistics, and discriminant analyses, were calculated using the statistical software

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

MPC, BRH, MCW and JS conceived and designed the experiments. MPC performed the experiments. MPC, BRH and JS analyzed the data. MPC drafted the manuscript. All authors have read and approved the final manuscript.

Acknowledgments

We thank Richard Hudson (University of Chicago) for suggesting how to simulate changing recombination rates using