Selection dramatically reduces effective population size in HIV-1 infection

Liu, Yi; Mittler, John E

doi:10.1186/1471-2148-8-133

Research article
Open access
Published: 03 May 2008

Selection dramatically reduces effective population size in HIV-1 infection

Yi Liu¹ &
John E Mittler¹

BMC Evolutionary Biology volume 8, Article number: 133 (2008) Cite this article

4119 Accesses
18 Citations
Metrics details

Abstract

Background

In HIV-1 evolution, a 100–100,000 fold discrepancy between census size and effective population size (N_e) has been noted. Although it is well known that selection can reduce N_e, high in vivo mutation and recombination rates complicate attempts to quantify the effects of selection on HIV-1 effective size.

Results

We use the inbreeding coefficient and the variance in allele frequency at a linked neutral locus to estimate the reduction in N_edue to selection in the presence of mutation and recombination. With biologically realistic mutation rates, the reduction in N_edue to selection is determined by the strength of selection, i.e., the stronger the selection, the greater the reduction. However, the dependence of N_eon selection can break down if recombination rates are very high (e.g., r ≥ 0.1). With biologically likely recombination rates, our model suggests that recurrent selective sweeps similar to those observed in vivo can reduce within-host HIV-1 effective population sizes by a factor of 300 or more.

Conclusion

Although other factors, such as unequal viral reproduction rates and limited migration between tissue compartments contribute to reductions in N_e, our model suggests that recurrent selection plays a significant role in reducing HIV-1 effective population sizes in vivo.

Background

The effective population size, N_e, is defined as the size of an idealized population that has the same population genetics properties (generally those properties that measure the magnitude of random genetic drift) as the actual population. Most studies have estimated the within-host N_efor HIV-1 during chronic infection to be ~10³ [1–5], though one study estimated N_eto be between 10⁵ and 5 × 10⁵ [6]. Even the highest of these estimates is about two orders of magnitude lower than the number of productively infected cells, estimated to be on the order of 10⁷ to 10⁸ cells [7]. Explanations for low N_evalues include unequal viral reproduction rates [2–5, 8], structured populations [8–12], and recurrent selection [2–5, 8]. The possibility that recurring selection may be reducing viral diversity is unsettling because most of the computational models used to estimate N_eassume neutral evolution.

During a selective sweep of a favorable allele, any neutral alleles linked to the selected allele will rise in frequency and become overrepresented in the population. This process, called "hitchhiking", can reduce neutral diversity more than random genetic drift and therefore reduce N_e[13]. Although selection has been acknowledged as a possible explanation for the low within-host effective population size during chronic HIV-1 infection [3, 12], high mutation [14, 15] and recombination rates [16–20] complicate attempts to study the effects of selection on HIV-1 in vivo. To address these issues, we extended a classic "inbreeding coefficient" method [21–23] to derive recurrence equations that account for the combined effects of selection, mutation, and recombination. We then used these equations to quantify the effects of selection on effective size using parameters relevant to HIV-1 evolution in vivo.

Results and Discussion

Overview of the genetic model

Our model follows the basic Wright-Fisher assumptions of a single haploid population of constant size with no subdivision or migration, non-overlapping generations, and random sampling of offspring each generation. We calculated N_ein terms of the inbreeding effective size, which is based on the change of the average inbreeding coefficient (F) at a neutral locus (L) that is linked to a locus (S) that is under selection. The inbreeding coefficient is defined as the probability that two individuals are identical by descent (which means they are identical and have a common ancestor). Therefore, for the neutral locus L, two individuals are identical by descent if they are derived from a common ancestor and are identical at locus L, regardless of the status of locus S. Our approach to estimating N_ewas to determine changes in the inbreeding coefficient at the neutral locus in the presence and absence of selection and recombination. The effective population size was defined as the size of the neutral population that gave changes in the inbreeding coefficient that were equal to those observed in the presence of selection and recombination.

As shown in Figure 1A, in the absence of recombination, an offspring can be derived from a parent in the previous generation with either allele a or A at locus S. An offspring with allele a can be derived by two pathways: from a parent with allele a (without mutation) or a parent with allele A (with an A to a mutation). An offspring with allele A can be derived by two similar pathways. Therefore, F_t(the value of F at time t) will be the sum of the probability that two offspring are derived from a certain combination of parents (both with allele A, both with allele a, and one with allele A and the other with allele a) times the probability that the offspring are identical by descent at locus L (see Appendix).

In the presence of recombination, loci L and S can be derived from different parents (Figure 1B). An offspring with allele a or A at locus S can be derived from one or more parents in the previous generation by the four pathways illustrated in Figure 1B. As above, F_twill be the sum of the probability that the two offspring are derived from a certain combination of pathways (both having locus S from parents with allele A, both having locus S from parents with allele a, one having locus S from a parent with allele a and the other having locus S from a parent with allele A) times the probability that the offspring derived from these pathways are identical by descent at locus L (see APPENDIX).

Effect of selection on effective population size

We used the ratio N/N_eto summarize the reduction in N_edue to selection from the start of selection at t = 0 until t = t_nearlyfixed[the time when the frequency of the advantageous allele reaches (N-1)/N]. This last approximation is helpful because fixation time is asymptotic, with the advantageous allele never reaching 100% in a deterministic model.

In the absence of mutation, the reduction in N_edue to selection was most strongly affected by the initial frequency of the advantageous allele, A₀ (Figure 2A). In the presence of mutation, the reduction in N_edue to selection was most sensitive to the selective advantage, s, of the advantageous allele (Figure 2B). Indeed, for a homogeneous population of N = 10⁷, the N/N_eratio increased 6 – 9 fold with each 10-fold increase in the selection coefficient in the presence of mutation. However, recombination can break the hitchhiking effect of selection on N_e(Figure 2C). For example, when r ≤ 10^-3, for locus L with U = μ, selection with s = 0.1 reduced N_eby ~20 fold. In contrast, when r ≥ 0.1, selection with s = 0.1 had little effect on N_e.

Effective population sizes calculated from the inbreeding coefficient (inbreeding N_e) are usually the same as those calculated from the variance in the allele frequency (variance N_e), though exceptions do occur [24–27]. To validate our results, we estimated the effect of selection on N_eby calculating the variance in the frequency of the linked neutral allele from simulations using the same genetic model. Values for the inbreeding N_eobtained from the calculations above were generally consistent with the estimates of the variance N_ederived from these simulations (Figure 2A to 2C). We noted that there was an approximate 3-fold difference in the N_evalues between the two methods when s = 0.01 (Figure 2B). This is likely due to the fact that the inbreeding N_ewas estimated using a strict deterministic model; while the variance N_ewas estimated from simulations of s = 0.01, where genetic drift plays a bigger role.

A very high mutation rate at the neutral locus L (e.g., U = 1000μ) also diminished the reduction in N_edue to selection (Figure 2D). In the absence of mutation, the effect of selection was insensitive to changes in the initial homogeneity at locus L (Figure 2E). In the presence of mutation, selection with an initially heterogeneous population at locus L caused greater reductions in N_ethan selection with an initially homogeneous population. For F₀ less than 0.1, however, further increases in the initial heterogeneity (i.e., making F₀ even lower) did not lead to further reductions in N_ethrough selection. Interestingly, reductions in N_e/N due to selection were insensitive to changes in the census population size, N (Figure 2F).

Effect of recurrent selection on effective population size

For a homogeneous population under recurrent selection, the inbreeding coefficient of the neutral allele decreased until it reached a quasi-steady state, where it fluctuated in a regular "sawtooth" fashion (Figure 3A). The effect of recurrent selection on N_ewas sensitive to selection strength. For example, for a homogeneous population of N = 10⁷ and U = μ, the decline of F_tover time under recurrent selection with s = 0.01 overlapped the neutral curve when N = 9,973,000, while the decline of F_tunder recurrent selection with s = 0.1 overlapped the neutral curve when N = 28,220 (Figure 3A). In other words, recurrent selection had little effect on N_ewhen s = 0.01, while recurrent selection reduced N_eby over 300-fold when s = 0.1 (Figure 3A and 3B). This reduction in N_eby recurrent selection could be diminished by high recombination rates (Figure 3B). Although recombination had little impact on the reduction in N_edue to selection under a model with s = 0.1 and r ≤ 10^-3, with r ≥ 0.1, recombination completely broke the hitchhiking effects of selection on N_ewith s = 0.1.

Conclusion

We examined the combined effects of selection, mutation, and recombination on the effective population size of a neutral locus that is linked to a locus under selection. Consistent with other studies [21–23], we found that selection can increase the inbreeding coefficient and reduce the inbreeding effective population size. Without mutation, this reduction is primarily determined by the initial frequency of the advantageous allele, i.e., the lower the initial frequency, the greater the effect. With mutation, this reduction is mostly determined by the strength of selection, i.e., the stronger the selection, the greater the effect. With moderate recombination rates (e.g., r ≤ 10^-3), recurrent selection can substantially lower N_e, though the hitchhiking effect disappears if the recombination rates are very high (e.g., r ≥ 0.1).

The effective population size of HIV-1 during chronic infection has been shown to be 100- to 100,000-fold lower than within-host census size. Indeed, CTL responses are a driving force of HIV-1 evolution and these responses continuously select for escape mutants during chronic infection [28–30]. In a comprehensive study of viral evolution and CTL responses during the first four years of HIV-1 infection in one subject, Liu et al. [30, 31] found that of the 25 epitopes detected in this subject, 17 were largely replaced by mutants over time. The selection coefficients for the CTL escape mutant(s) of a single epitope ranged from 0.2 to 0.4 during acute infection and from 0.0024 to 0.15 during chronic infection, with an average of 0.03 [30]. Humoral and escape-specific CTL responses impose additional selective pressures not quantified in Liu et al. [30, 31]. With low to moderate recombination rates, our model shows that recurrent selection with s = 0.1 reduces viral effective population size by approximately 300-fold. Therefore, during HIV-1 infection, selection alone is likely to reduce the viral effective population size to an N_eof ~10⁵. This result is close to the estimate of N_e~5 × 10⁵ that Rouzine and Coffin [6] obtained from a model that accounts for selection. The small discrepancy may be due to their use of a lower mutation rate (10^-5 vs. 2.5 × 10^-5 in our study) and possible biased sampling of sites with higher underlying mutation rates in their study [5].

With high recombination rates, our model predicts that selection has little effect on N_e. Observations of 3 to 13 cross-over events per virion in vitro [17–20] suggest an intrinsic recombination rate of 10^-4 to 10^-3 per adjacent site per generation. However, this range is not relevant to our model since these estimates were obtained using heterozygous virions, which may not be abundant in vivo. While Jung and colleagues [32] have demonstrated that cells in the spleen are infected with multiple viruses (a pre-requisite for the formation of heterozygous virions), they did not determine how often heterozygous virions are formed. More relevant is data in which SCID-HU mice were infected with a 50:50 mixture of two marked strains [19]. Two-to-three weeks after infection, an average of ~0.01% of infected cells carried a phenotypic marker of recombination (present on half of all recombinants). Conservatively assuming a single generation of recombination, we estimate from equation (11, Appendix) that the probability of recombination between their two markers (which were 408 bp apart) was r = ~ p_Aa/(p_Ap_a) = 0.0001/(0.5 × 0.5) = ~4 × 10^-4 per virion per generation – a value too low to break the hitchhiking effects of selection in our model. However, we recognize these are approximate values obtained from a somewhat artificial system. HIV-1 evolution studies could benefit from additional studies of marked viruses in animal models and clever retrospective analyses of in vivo data from humans to determine evolutionarily relevant recombination rates.

Methods

Genetic model

We assume a Wright-Fisher model with a neutral locus L that is linked to a locus under selection, locus S. The selected locus has two alleles, an advantageous allele, A, with a fitness w = 1 + s, and a disadvantageous allele, a, with a fitness of 1. Allele A mutates to a at rate μ and allele a mutates to A at rate ν, while neutral mutations at locus L occur at rate U. A description of all the characters, parameters, and variables used in this study is listed in Table 1. For the purposes of calculation, we assume the following parameters are known: the initial frequency of allele A (A₀); the initial frequency of allele a (a₀); and the initial inbreeding coefficient at locus L among all individuals (F₀), among individuals with allele A (F_AA,0), among individuals with allele a (F_aa,0), and between individuals with allele A and those with allele a (F_Aa,0).

Table 1 Description of characters, parameters and variables.

Full size table

Parameters for HIV-1

The average mutation rate of HIV-1 has been estimated to be 2.5 × 10^-5 per nucleotide per generation [14], although one recent study estimated a higher mutation rate of ~8.5 × 10^-5 per site per generation [15]. Assuming that any nucleotide substitution at a defined nucleotide site shifts locus S from the advantageous to the disadvantageous state, we defined μ = 2.5 × 10^-5 per generation. Assuming that only a particular nucleotide substitution at this site increases fitness, we set ν = μ/3. Since the census sizes of productively HIV-1 infected cells in vivo exceeds 10⁷ [7, 33], most of the comparisons in this study were with N = 10⁷. Since the accumulation of advantageous alleles in populations is more stochastic as N decreases, we only examined populations with N ≥ 10⁶.

Effect of selection on effective population size without mutation

Under selection, the inbreeding coefficient of the linked neutral locus will increase faster than expected by random genetic drift until the selected advantageous allele is fixed (A_t= 100%). Because we are using a deterministic model, fixation time is asymptotic. To quantify the effect of selection, we determined the average time for an advantageous allele to approach fixation, t_nearlyfixed, and the value of F at t_nearlyfixed. t_nearlyfixedcan be calculated from $t = \log (\frac{A_{t} a_{0}}{a_{t} A_{0}}) / \log (w)$ , where t is the time just before the favored allele A at locus S becomes fixed; i.e., when $A_{t} = \frac{N - 1}{N}$ and $a_{t} = \frac{1}{N}$ . F was calculated with μ = 0, v = 0, and U = 0. The corresponding N_eis defined here as the population size under neutrality that will increase F from F₀ to $F_{t_{n e a r l y f i x e d}}$ between t = 0 and t = t_nearlyfixed. We determined the corresponding N_eunder the following conditions: N = 10⁶ to 10⁹; s = 0.01 to 10; A₀ = 10^-7 to 10^-3; and F₀ = F_AA,0= F_aa,0= F_Aa,0= 10^-4 to 0.8 (if F₀ = 1, F will not change over time without mutation, regardless of selection).

Effect of selection on effective population size with mutation and recombination

The frequency of the A allele cannot be maintained at 100% with the occurrence of the back mutation from A to a at locus S. Therefore t_nearlyfixedwas set to the time that A_tand a_treached equilibrium, i.e., when A_t= A_t+1. The corresponding N_e, the population size under neutrality that will increase F from F₀ to $F_{t_{n e a r l y f i x e d}}$ between t = 0 and t = t_nearlyfixed, was determined using numerical iteration [Appendix equation (2)]. We determined the corresponding N_eunder the following conditions: N = 10⁶ to 10⁹; s = 0.01 to 10; A₀ = 0 to 10^-3; F₀ = F_aa,0= 10^-4 to 1; F_AA,0= F_Aa,0= 0, if A₀ = 0 and F_AA,0= F_Aa,0= F₀, if A₀ > 0; μ = 2.5 × 10^-5, v = μ/3, U = μ to 1000μ, and r = 0 to 1. With these high advantageous mutation rates and large population sizes (Nv >> 1), individuals with allele a had mutations to allele A in almost every generation, preventing advantageous allele A from being lost from the population due to genetic drift.

Effect of recurrent selection on effective population size

With the fixation of the advantageous allele A, the inbreeding coefficient of locus L will undergo a nearly neutral change unless new alleles linked to locus L become advantageous. To estimate the effect of recurrent selection on N_e, we assumed that all loci under selection are linked to locus L in the absence of recombination. We also assumed that each selected locus was under sequential selection, i.e., when the frequency of an advantageous allele reached 99.9% at generation t, we assumed that another locus started to undergo selection (calculated by setting A_t= 0, F_{aa, t}= F_t, F_{AA, t}= 0, and F_{Aa, t}= 0). For simplicity, we assumed that all of the selected loci have the same mutation rate and selection coefficient. We calculated F under recurrent selection under the following conditions: N = 10⁷, A₀ = 0, F_AA,0= F_Aa,0= 0, F₀ = F_aa,0= 1, μ = 2.5 × 10^-5, v = μ/3, and U = μ; s = 0.01 to 10; and r = 0 to 1.

Estimate of the effect of selection on variance effective population size by simulation

The change in the average inbreeding coefficient is one of several criteria used to estimate effective population size [24–27]. To validate our results using a different measure of effective population size, we estimated the effect of selection on N_eby calculating the variance in the frequency of the linked neutral allele from simulations using the genetic model described above. The parameters used in these simulations were the same as those used for the calculation for the inbreeding coefficient described above. When simulating selection in the absence of mutation, the simulations were performed under the following conditions: N = 10⁷; s = 0.01 to 10; A₀ = 10^-7 to 10^-3; F₀ = F_AA,0= F_aa,0= F_Aa,0= 0.1; μ = v = U = 0; and r = 0. When simulating selection in the presence of mutation, the simulations were performed with the following conditions: N = 10⁷; s = 0.01 to 10; A₀ = 0; F₀ = F_aa,0= 1, F_AA,0= F_Aa,0= 0; μ = 2.5 × 10^-5, v = μ/3, and U = μ; s = 0.01 to 10; and r = 0 to 1. Since the deterministic model assumes an infinite population size, we only examined a large population size of 10⁷. For each condition, 100,000 simulations were repeated. We calculated the variance of the allele frequency at the linked neutral locus L at the corresponding t_nearlyfixed. Under neutrality in the absence of mutation, the allele frequency variance can be calculated by $p (1 - p) [1 - {(1 - \frac{1}{N})}^{t}]$ [34]. Therefore, the population size under neutrality (N_e) that has the same variance in allele frequency as the population under selection can be determined using numerical iteration. In the presence of mutation, we used simulation to determine the range of the population size under neutrality. These were used to determine the range of allele frequency variances that matched the frequency variance under selection at the corresponding t_nearlyfixed.

Appendix

Recurrence equation for Fin the absence of selection

In the absence of mutation or selection, the inbreeding coefficient is

F_{t} = \frac{1}{N} + (1 - \frac{1}{N}) F_{t - 1} = 1 - {(1 - \frac{1}{N})}^{t} (1 - F_{0})

(1)

where t is time in generations and N is the population size [26]. $\frac{1}{N}$ gives the probability that two offspring are derived from same parent in which case the probability of them being identical by descendent is 1. $(1 - \frac{1}{N})$ is the probability that two offspring are derived from different parents in which case the probability of them being identical by descendent is F_t-1. In the presence of mutation, $F_{t} = [\frac{1}{N} + (1 - \frac{1}{N}) F_{t - 1}] {(1 - U)}^{2}$ [35]. To obtain F_tin terms of F₀, let $α = \frac{1}{N} \times {(1 - U)}^{2}$ , and $β = (1 - \frac{1}{N}) \times {(1 - U)}^{2}$ . This gives

F₁ = α + βF₀

F₂ = α + βF₁ = α + β × (α + βF₀) = α + αβ + β²F₀

F₃ = α + βF₂ = α + β × (α + αβ + β²F₀) = α + αβ + αβ² + β³F₀

F_t= α + αβ + αβ² + αβ³ + ... + αβ_t-1+ β^tF₀.

The formula, 1 + x + ... + x^n-1= (1-xⁿ)/(1-x), gives the following:

F_t= α (1 - β^t)/(1 - β) + β^tF₀, (2)

As t approaches infinity, F converges to the equilibrium $\hat{F} = \frac{α}{(1 - β)} \approx \frac{1}{1 + 2 N U}$ , as shown previously by Kimura and Crow [35].

Recurrence equations for Fin the presence of a selected locus without recombination

In the presence of selection, the F value at time t is the sum of the probability of two offspring being derived from parents having alleles AA, aa, or Aa at locus S multiplied by the probability that the offspring will be identical by descent at locus L. In other words:

F_{t} = {p_{A, t}^{2} [\frac{1}{N A_{t - 1}} + (1 - \frac{1}{N A_{t - 1}}) F_{A A, t - 1}] + p_{a, t}^{2} [\frac{1}{N a_{t - 1}} + (1 - \frac{1}{N a_{t - 1}}) F_{a a, t - 1}] + 2 p_{A, t} p_{a, t} F_{A a, t - 1}} {(1 - U)}^{2} .

(3)

Here, A_t-1and a_t-1are the frequencies of the advantageous and disadvantageous alleles at locus S at generation t-1. $p_{A, t} = \frac{w A_{t - 1}}{w A_{t - 1} + a_{t - 1}}$ and $p_{a, t} = \frac{a_{t - 1}}{w A_{t - 1} + a_{t - 1}}$ give the probabilities that an offspring at generation t is derived from a parent at generation t-1 with allele A or a, respectively. F_AA, F_Aa, and F_aagive the probabilities that parents with the indicated alleles will be identical by descent at locus L. Given that both parents have allele A or a at locus S, the $\frac{1}{N A_{t - 1}}$ and $\frac{1}{N a_{t - 1}}$ terms respectively give the probabilities that two offspring have the same parent (in which case the probability of being identical by descent at locus L, in the absence of mutation is 1). The $1 - \frac{1}{N A_{t - 1}}$ and $1 - \frac{1}{N a_{t - 1}}$ terms give the probability that the two offspring came from different parents (in which case the probabilities of identity by descent at locus L, in the absence of mutation, are F_{AA, t-1}and F_{aa, t-1}respectively). The term (1-U)² accounts for the fact that two individuals cannot be identical by descent if there is a mutation at the neutral locus L.

If the parameters w, μ, v, U, A₀, a₀, F₀, F_AA,0, F_aa,0, and F_Aa,0are known, F₁ can be calculated using equation (3). In addition, F_AA,1, F_aa,1, and F_Aa,1can be calculated using the following equations:

\begin{array}{l} F_{A A, t} = {{(p_{A \to A, t})}^{2} [\frac{1}{N A_{t - 1}} + (1 - \frac{1}{N A_{t - 1}}) F_{A A, t - 1}] + {(p_{a \to A, t})}^{2} [\frac{1}{N a_{t - 1}} + (1 - \frac{1}{N a_{t - 1}}) F_{a a, t - 1}] \\ + 2 p_{A \to A, t} p_{a \to A, t} F_{A a, t - 1}} {(1 - U)}^{2}, \end{array}

(4)

\begin{array}{l} F_{a a, t} = {{(p_{a \to a, t})}^{2} [\frac{1}{N a_{t - 1}} + (1 - \frac{1}{N a_{t - 1}}) F_{a a, t - 1}] + {(p_{A \to a, t})}^{2} [\frac{1}{N A_{t - 1}} + (1 - \frac{1}{N A_{t - 1}}) F_{A A, t - 1}] \\ + 2 p_{A \to a, t} p_{a \to a, t} F_{A a, t - 1}} {(1 - U)}^{2} \end{array}

(5)

and

F_Aa,t= (p_A→A,tp_a→a,tF_Aa,t-1+ p_a→a,tp_a→A,tF_aa,t-1+ p_A→a,tp_A→A,tF_AA,t-1+ p_A→a,tp_a→A,tF_Aa,t-1)(1 - U)²

Where p_{x→y, t}is the probability that a sampled offspring is descended from a parent with allele x given that the offspring has allele y at locus S. The reasoning behind equations (4) – (6) is similar to that for equation (3). For each offspring, p_{x→y, t}can be calculated as the probability of the parent having allele x at locus S multiplied by the probability that x mutates to y (or fails to mutate, if x = y), divided by the probability that the offspring is y. In other words,

\begin{matrix} p_{A \to A, t} = \frac{w \times A_{t - 1}}{w \times A_{t - 1} + a_{t - 1}} (1 - μ) / A_{t}, & p_{a \to A, t} = \frac{a_{t - 1}}{w \times A_{t - 1} + a_{t - 1}} v / A_{t} \end{matrix}

and

\begin{matrix} p_{a \to a, t} = \frac{a_{t - 1}}{w \times A_{t - 1} + a_{t - 1}} (1 - v) / a_{t}, & p_{A \to a, t} = \frac{w \times A_{t - 1}}{w \times A_{t - 1} + a_{t - 1}} μ / a_{t} \end{matrix}

where A_tand a_tare given by

A_{t} = \frac{w \times A_{t - 1}}{w \times A_{t - 1} + a_{t - 1}} (1 - μ) + \frac{a_{t - 1}}{w \times A_{t - 1} + a_{t - 1}} v

(7)

a_{t} = \frac{a_{t - 1}}{w \times A_{t - 1} + a_{t - 1}} (1 - v) + \frac{w A_{t - 1}}{w \times A_{t - 1} + a_{t - 1}} μ

(8)

When A₁, a₁, F₁, F_AA,1, F_aa,1, and F_Aa,1are available, we can calculate F₂ using equation (3), and F_AA,2, F_aa,2, and F_Aa,2using equations (4) to (6). Therefore, F_tcan be obtained by iteration.

Recurrence equations for Fin the presence of a selected locus and recombination

Assuming that loci L and S recombine with a probability r per generation, we obtain

P_{A A, t} = p_{A, t} (1 - r) + p_{A, t} r \frac{N p_{A, t} - 1}{N - 1}

(9)

P_{a a, t} = p_{a, t} (1 - r) + p_{a, t} r \frac{N p_{a, t} - 1}{N - 1}

(10)

P_{A a, t} = p_{a A, t} = p_{A, t} r \frac{N p_{a, t}}{N - 1}

(11)

where p_{xy, t}is the probability that an individual at generation t has a neutral locus L derived from an individual with allele x at locus S, and a selected locus S derived from an individual with allele y at locus S (x and y can be A or a). This probability is the sum of the probability of no recombination and the probability of recombination between individuals with indicated allele at locus S. The -1's in the (Np_{x, t}- 1) and (N - 1) terms above account for the fact that a haploid individual cannot recombine with itself.

Similar to equations (3) – (6), with recombination,

\begin{array}{l} F_{t} = {\begin{array}{l} (p_{A A, t}^{2} + p_{A a, t}^{2} + 2 p_{A A, t} p_{A a, t}) [\frac{1}{N A_{t - 1}} + (1 - \frac{1}{N A_{t - 1}}) F_{A A, t - 1}] \\ + (p_{a A, t}^{2} + p_{a a, t}^{2} + 2 p_{a A, t} p_{a a, t}) [\frac{1}{N a_{t - 1}} + (1 - \frac{1}{N a_{t - 1}}) F_{a a, t - 1}] \\ + 2 (p_{A A, t} p_{a A, t} + p_{A A, t} p_{a a, t} + p_{a A, t} p_{A a, t} + p_{a a, t} p_{A a, t}) F_{A a, t - 1} \end{array}} {(1 - U)}^{2} \\ = {p_{A, t}^{2} [\frac{1}{N A_{t - 1}} + (1 - \frac{1}{N A_{t - 1}}) F_{A A, t - 1}] + p_{a, t}^{2} [\frac{1}{N a_{t - 1}} + (1 - \frac{1}{N a_{t - 1}}) F_{a a, t - 1}] + 2 p_{A, t} p_{a, t} F_{A a, t - 1}} {(1 - U)}^{2} \end{array}

(12)

\begin{array}{l} F_{A A, t} = {{(\frac{p_{A A} (1 - μ) + p_{A a} v}{A_{t}})}^{2} [\frac{1}{N A_{t - 1}} + (1 - \frac{1}{N A_{t - 1}}) F_{A A, t - 1}] \\ +(\frac{p_{a A} (1 - μ) + p_{a a} v}{A_{t}})^{2} [\frac{1}{N a_{t - 1}} + (1 - \frac{1}{N a_{t - 1}}) F_{a a, t - 1}] \\ + \frac{2[p_{A A} (1 - μ) \times p_{a A} (1 - μ) + p_{A A} (1 - μ) \times p_{a a} v + p_{a A} (1 - μ) \times p_{A a} v + p_{A a} v \times p_{a a} v]}{A_{t}^{2}} F_{A a, t - 1}} {(1 - U)}^{2} \end{array}

(13)

\begin{array}{l} F_{a a, t} = {{(\frac{p_{A a} (1 - v) + p_{A A} μ}{a_{t}})}^{2} [\frac{1}{N A_{t - 1}} + (1 - \frac{1}{N A_{t - 1}}) F_{A A, t - 1}] \\ +(\frac{p_{a a} (1 - v) + p_{a A} μ}{a_{t}})^{2} [\frac{1}{N a_{t - 1}} + (1 - \frac{1}{N a_{t - 1}}) F_{a a, t - 1}] \\ + \frac{2[p_{A a} (1 - v) \times p_{a a} (1 - v) + p_{A a} (1 - v) \times p_{a A} μ + p_{a a} (1 - v) \times p_{A A} μ + p_{A A} μ \times p_{a A} μ]}{a_{t}^{2}} F_{A a, t - 1}} {(1 - U)}^{2} \end{array}

(14)

\begin{array}{l} F_{A a, t} = {(\frac{p_{A A} (1 - μ) + p_{A a} v}{A_{t}}) \times (\frac{p_{A a} (1 - v) + p_{A A} μ}{a_{t}}) [\frac{1}{N A_{t - 1}} + (1 - \frac{1}{N A_{t - 1}}) F_{A A, t - 1}] \\ +(\frac{p_{a A} (1 - μ) + p_{a a} v}{A_{t}}) \times (\frac{p_{a a} (1 - v) + p_{a A} μ}{a_{t}})[\frac{1}{N a_{t - 1}} + (1 - \frac{1}{N a_{t - 1}}) F_{a a, t - 1}] \\ + \frac{2 p_{A A} p_{a A} μ (1 - μ) + 2 p_{A a} p_{a a} v (1 - v) + (p_{A A} p_{a a} + p_{a A} p_{A a}) [(1 - μ) (1 - v) + μ v]}{A_{t} a_{t}} F_{A a, t - 1}} {(1 - U)}^{2} \end{array} .

(15)

where A_tand a_tare calculated using equations (7) and (8).

References

Drummond AJ, Nicholls GK, Rodrigo AG, Solomon W: Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. Genetics. 2002, 161 (3): 1307-1320.
PubMed Central CAS PubMed Google Scholar
Leigh-Brown AJ: Analysis of HIV-1 env gene sequences reveals evidence for a low effective number in the viral population. Proc Natl Acad Sci USA. 1997, 94 (5): 1862-1865. 10.1073/pnas.94.5.1862.
Article Google Scholar
Achaz G, Palmer S, Kearney M, Maldarelli F, Mellors JW, Coffin JM, Wakeley J: A Robust Measure of HIV-1 Population Turnover within Chronically Infected Individuals. Mol Biol Evol. 2004, 21 (10): 1902-1912. 10.1093/molbev/msh196.
Article CAS PubMed Google Scholar
Seo T-K, Thorne JL, Hasegawa M, Kishino H: Estimation of Effective Population Size of HIV-1 Within a Host: A Pseudomaximum-Likelihood Approach. Genetics. 2002, 160 (4): 1283-1293.
PubMed Central PubMed Google Scholar
Shriner D, Shankarappa R, Jensen MA, Nickle DC, Mittler JE, Margolick JB, Mullins JI: Influence of random genetic drift on human immunodeficiency virus type 1 env evolution during chronic infection. Genetics. 2004, 166 (3): 1155-1164. 10.1534/genetics.166.3.1155.
Article PubMed Central CAS PubMed Google Scholar
Rouzine IM, Coffin JM: Linkage disequilibrium test implies a large effective population number for HIV in vivo. Proc Natl Acad Sci USA. 1999, 96 (19): 10758-10763. 10.1073/pnas.96.19.10758.
Article PubMed Central CAS PubMed Google Scholar
Haase AT: Population biology of HIV-1 infection: viral and CD4+ T cell demographics and dynamics in lymphatic tissues. Annu Rev Immunol. 1999, 17: 625-656. 10.1146/annurev.immunol.17.1.625.
Article CAS PubMed Google Scholar
Kouyos RD, Althaus CL, Bonhoeffer S: Stochastic or deterministic: what is the effective population size of HIV-1?. Trends Microbiol. 2006, 14 (12): 507-511. 10.1016/j.tim.2006.10.001.
Article CAS PubMed Google Scholar
Haddad DN, Birch C, Middleton T, Dwyer DE, Cunningham AL, Saksena NK: Evidence for late stage compartmentalization of HIV-1 resistance mutations between lymph node and peripheral blood mononuclear cells. AIDS. 2000, 14 (15): 2273-2281. 10.1097/00002030-200010200-00008.
Article CAS PubMed Google Scholar
Fulcher JA, Hwangbo Y, Zioni R, Nickle D, Lin X, Heath L, Mullins JI, Corey L, Zhu T: Compartmentalization of human immunodeficiency virus type 1 between blood monocytes and CD4+ T cells during infection. J Virol. 2004, 78 (15): 7883-7893. 10.1128/JVI.78.15.7883-7893.2004.
Article PubMed Central CAS PubMed Google Scholar
Wong JK, Ignacio CC, Torriani F, Havler D, Fitch NJS, Richman DD: In vivo compartmentalization of human immunodeficiency virus: evidence from the examination of pol sequences from autopsy tissues. J Virol. 1997, 71 (3): 2059-2071.
PubMed Central CAS PubMed Google Scholar
Shriner D, Liu Y, Nickle DC, Mullins JI: Evolution of intrahost HIV-1 genetic diversity during chronic infection. Int J Org Evolution. 2006, 60 (6): 1165-1176.
Google Scholar
Barton NH: Genetic hitchhiking. Philos Trans R Soc Lond B Biol Sci. 2000, 355 (1403): 1553-1562. 10.1098/rstb.2000.0716.
Article PubMed Central CAS PubMed Google Scholar
Mansky LM: Forward mutation rate of Human Immunodeficiency Virus Type 1 in a T lymphoid cell line. AIDS Res Hum Retroviruses. 1996, 12 (4): 307-314.
Article CAS PubMed Google Scholar
O'Neil PK, Sun G, Yu H, Ron Y, Dougherty JP, Preston BD: Mutational analysis of HIV-1 long terminal repeats to explore the relative contribution of reverse transcriptase and RNA polymerase II to viral mutagenesis. J Biol Chem. 2002, 277 (41): 38053-38061. 10.1074/jbc.M204774200.
Article PubMed Google Scholar
Shriner D, Rodrigo AG, Nickle DC, Mullins JI: Pervasive genomic recombination of HIV-1 in vivo. Genetics. 2004, 167 (4): 1573-1583. 10.1534/genetics.103.023382.
Article PubMed Central CAS PubMed Google Scholar
Zhuang J, Jetzt AE, Sun G, Yu H, Klarmann G, Ron Y, Preston BD, Dougherty JP: Human Immunodeficiency Virus Type 1 Recombination: Rate, Fidelity, and Putative Hot Spots. J Virol. 2002, 76 (22): 11273-11282. 10.1128/JVI.76.22.11273-11282.2002.
Article PubMed Central CAS PubMed Google Scholar
Jetzt AE, Yu H, Klarmann GJ, Ron Y, Preston BD, Dougherty JP: High rate of recombination throughout the human immunodeficiency virus type 1 genome. J Virol. 2000, 74 (3): 1234-1240. 10.1128/JVI.74.3.1234-1240.2000.
Article PubMed Central CAS PubMed Google Scholar
Levy DN, Aldrovandi GM, Kutsch O, Shaw GM: Dynamics of HIV-1 recombination in its natural target cells. Proc Natl Acad Sci USA. 2004, 101 (12): 4204-4209. 10.1073/pnas.0306764101.
Article PubMed Central CAS PubMed Google Scholar
Rhodes T, Wargo H, Hu WS: High rates of human immunodeficiency virus type 1 recombination: near-random segregation of markers one kilobase apart in one round of viral replication. J Virol. 2003, 77 (20): 11193-11200. 10.1128/JVI.77.20.11193-11200.2003.
Article PubMed Central CAS PubMed Google Scholar
Robertson A: Inbreeding in artificial selection programmes. Genet Res. 1961, 2: 189-194.
Article Google Scholar
Woolliams JA, Bijma P: Predicting rates of inbreeding in populations undergoing selection. Genetics. 2000, 154 (4): 1851-1864.
PubMed Central CAS PubMed Google Scholar
Wray NR, Thompson R: Prediction of rates of inbreeding in selected populations. Genet Res. 1990, 55 (1): 41-54.
Article CAS PubMed Google Scholar
Ewens WJ: On the concept of the effective population size. Theor Popul Biol. 1982, 21: 373-378. 10.1016/0040-5809(82)90024-7.
Article Google Scholar
Kimura M, Crow JF: The measurement of effective population numbers. Evolution. 1963, 17: 279-288. 10.2307/2406157.
Article Google Scholar
Wright S: Evolution in Mendelian Populations. Genetics. 1931, 16: 97-159.
PubMed Central CAS PubMed Google Scholar
Hartl DL, Clark AG: Principles of Population Genetics. 1989, Sunderland, MA: Sinauer Associates, 2
Google Scholar
Allen TM, Altfeld M, Geer SC, Kalife ET, Moore C, O'Sullivan KM, Desouza I, Feeney ME, Eldridge RL, Maier EL, et al: Selective escape from CD8+ T-cell responses represents a major driving force of human immunodeficiency virus type 1 (HIV-1) sequence diversity and reveals constraints on HIV-1 evolution. J Virol. 2005, 79 (21): 13239-13249. 10.1128/JVI.79.21.13239-13249.2005.
Article PubMed Central CAS PubMed Google Scholar
Jones NA, Wei X, Flower DR, Wong M, Michor F, Saag MS, Hahn BH, Nowak MA, Shaw GM, Borrow P: Determinants of human immunodeficiency virus type 1 escape from the primary CD8+ cytotoxic T lymphocyte response. J Exp Med. 2004, 200 (10): 1243-1256. 10.1084/jem.20040511.
Article PubMed Central CAS PubMed Google Scholar
Liu Y, McNevin J, Cao J, Zhao H, Genowati I, Wong K, McLaughlin S, McSweyn M, Diem K, Stevens C, et al: Selection on the human immunodeficiency virus type 1 proteome following primary infection. J Virol. 2006, 80 (19): 9519-9529. 10.1128/JVI.00575-06.
Article PubMed Central CAS PubMed Google Scholar
Liu Y, McNevin J, Zhao H, Tebit DM, Troyer RM, McSweyn M, Ghosh AK, Shriner D, Arts EJ, McElrath MJ, et al: Evolution of HIV-1 CTL epitopes: Fitness-Balanced Escape. J Virol. 2007, 81 (22): 12179-12188. 10.1128/JVI.01277-07.
Article PubMed Central CAS PubMed Google Scholar
Jung A, Maier R, Vartanian J-P, Bocharov G, Jung V, Fischer U, Meese E, Wain-Hobson S, Meyerhans A: Multiply infected spleen cells in HIV patients. Nature. 2002, 418 (6894): 144-10.1038/418144a.
Article CAS PubMed Google Scholar
Chun T-W, Carruth L, Finzi D, Shen X, DiGiuseppe JA, Taylor H, Hermankova M, Chadwick K, Margolick J, Quinn TC, et al: Quantification of latent tissue reservoirs and total body viral load in HIV-1 infection. Nature. 1997, 387: 183-188. 10.1038/387183a0.
Article CAS PubMed Google Scholar
Nei M: Molecular Evolutionary Genetics. 1987, New York: Columbia University Press
Google Scholar
Kimura M, Crow JF: The Number of Alleles That Can Be Maintained in a Finite Population. Genetics. 1964, 49: 725-738.
PubMed Central CAS PubMed Google Scholar

Download references

Acknowledgements

We thank James I. Mullins for his guidance and critical comments on an early draft, two anonymous reviewers for constructive criticisms, and Reneé Ireton for editing the final version. The authors were supported by grants from the NIH (R03 AI055394, R01 HL072631, P01 AI57005, R01 AI058894, and RO1 AI047734), the University of Washington Center for AIDS Research (NIH grant P30 AI27757), and a gift from the Frank H. Jernigan Charitable Foundation.

Author information

Authors and Affiliations

Department of Microbiology, University of Washington School of Medicine, Seattle, Washington, 98195, USA
Yi Liu & John E Mittler

Authors

Yi Liu
View author publications
You can also search for this author in PubMed Google Scholar
John E Mittler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Liu.

Additional information

Authors' contributions

YL and JM jointly conceived the study. YL derived the equations, wrote the computer code, performed the computational experiments, and drafted the manuscript. JM advised on the study design, participated in the analysis of the mathematical and computational data, and helped draft the manuscript. Both authors have read and approved the paper.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Liu, Y., Mittler, J.E. Selection dramatically reduces effective population size in HIV-1 infection. BMC Evol Biol 8, 133 (2008). https://doi.org/10.1186/1471-2148-8-133

Download citation

Received: 13 December 2007
Accepted: 03 May 2008
Published: 03 May 2008
DOI: https://doi.org/10.1186/1471-2148-8-133

Selection dramatically reduces effective population size in HIV-1 infection