Abstract
Precise dating of viral subtype divergence enables researchers to correlate divergence with geographic and demographic occurrences. When historical data are absent (that is, the overwhelming majority), viral sequence sampling on a time scale commensurate with the rate of substitution permits the inference of the times of subtype divergence. Currently, researchers use two strategies to approach this task, both requiring strong conditions on the molecular clock assumption of substitution rate. As the underlying structure of the substitution rate process at the time of subtype divergence is not understood and likely highly variable, we present a simple method that estimates rates of substitution, and from there, times of divergence, without use of an assumed molecular clock. We accomplish this by blending estimates of the substitution rate for triplets of dated sequences where each sequence draws from a distinct viral subtype, providing a zerothorder approximation for the rate between subtypes. As an example, we calculate the time of divergence for three genes among influenza subtypes AH3N2 and B using subtype C as an outgroup. We show a time of divergence approximately 100 years ago, substantially more recent than previous estimates which range from 250 to 3800 years ago.
Background
Precise estimates are sorely lacking for dating the emergence and divergence of viral subtypes. Improved estimates equip epidemiologists and virologists to begin to correlate these important establishing events with historical demographic changes, geographical invasions and zoonoses, the transferring of a virus from one host species to another [7,1,25]. For example, archeological sequence data can furnish accurate dates and show that substantial genomic changes associate with geographical invasion and zoonosis [14,17]. Further, the recent availability of viral gene sequences sampled at a pace commensurate with their rate of nucleotide substitution vastly augments the ability to rigorously infer the time scale of phylogenies and hence determine the time of the most recent common ancestor (TMRCA) for different viral types [18,26,6].
Systematic studies characterize the substitution process and substitution rate process of several classes of viral subtypes in, for example, Dengue, influenza subtype A, human immunodeficiency virus (HIV) and the virus responsible for sudden acute respiratory syndrome (SARS). For the last three viruses, a unique zoonotic transfer appears to cooccur with substantial changes in both the composition of nucleotides and amino acids as well as alterations in the rate of nucleotide substitution [15,14,1]. In Dengue, where a single subtype simultaneously inhabits two hosts (humans and Aedes aegypti) in a persistent zoonotic process, the introduction of the virus to new geographical environments associates with a dramatic increase in sequence diversity [25]. Unfortunately, no studies thus far analyze the rate of nucleotide substitution during either geographical invasion or zoonosis. Consequently, studies of the date of origins of viral subtypes must use strong a priori assumptions on the rate structure of nucleotide substitution.
Two primary methods find use to date the time of viral subtype divergence. The most commonly employed approach determines the divergence time of subtypes using a molecular clock assumption (MCA) over an entire phylogeny [18,21,5,26]. In its strict formulation, the MCA posits a proportional relation between the number of substitutions and the intervening time period over the entire phylogeny. Looser forms of MCAs require only that the proportionality hold along individual branches, with the rates across branches drawn from a prespecified distribution [5]. Committed to some variant of the MCA, current algorithms then estimate the rate of nucleotide substitution over all taxa in a given set. Consequently, these methods provide inference most suitable for situations where sequence evolution follows a MCA (e.g. influenza AH3N2 in human hosts, as in [9]) or deviates from the MCA homogenously in time (e.g. perhaps influenza A in wild fowl, see [3]). In considering divergence events between viral subtypes, even when the MCA wellapproximates nucleotide substitution within a given subtype, the above methods may incorrectly infer the time of divergence across subtypes. By either assuming that a single rate of nucleotide substitution holds for the region preceding the common ancestor of each subtype or by smoothing the rate of nucleotide substitution over clades with different numbers of taxa, the adherence to a MCA prevents direct inference of the rate during subtype divergence.
Suzuki and Nei (2002) propose an alternative, more heuristic method of estimation to counteract the problem of differing rates of substitution before and after zoonotic events [23,25]. In these studies, the evolutionary models draw a distinction between the rate of substitution within a given subtype and the rate of substitution between subtypes. However, trouble arises since there are no methods for estimating the latter quantity. Consequently, the models assume that the rate of substitution for portions of the phylogeny between the subtypes equals the mean rate in the initial host species population. For instance, in dating the time of divergence between influenza B hemagglutinin and influenza C hemagglutininesterase, Suzuki and Nei use the rate of amino acid substitution for water fowl for the portions of the phylogeny previous to the TMRCA of these two proteins [23]. While this method may accurately reflect the rate within avian and human hosts, it neglects whatever additional changes in the rate of substitution are due to the process of zoonotic adaptation, likely leading to a substantial underestimation of the date of the TMRCA.
The study here focuses on influenza, although the techniques are readily applied to other rapidly evolving organisms. Influenza has three types, A, B and C, classified based on serological analysis. To date, only type A sequences have been demonstrably associated with global pandemics [4]. Since modern surveillance began in the 1930s, type B has only been responsible for mild epidemics while type C has been nearly asymptomatic in human infection. Several subtypes of A, notably H1N1 and H3N2, are currently cocirculating in the human population. As the H1N1 and H3N2 subtypes may be as divergent from each other as they are from types B and C, we will refer to all types and subtypes simply as subtypes for the remainder of this paper. We select for this study three genes, coding for hemagglutinin (HA), the matrix protein (MP) and the nonstructural protein (NS) responsible for interfering with host immune response. Subtype C has a hemagglutininesterase gene that is analogous to the hemagglutin gene in other subtypes [1]. We hence refer to the hemagglutinin gene generally and the hemagglutininesterase gene when referring specifically to the subtype C sequences.
We present a simple estimation tool to determine the date of divergence among viral subtypes that overcomes the difficulties encountered with use of the MCA by measuring the pairwise rate of substitution between taxa. Our estimator derives from the triplet statistic developed in [26,22,13], where each sequence member of the triplet draws from a different subtype. In this manner, we generate from each triplet an estimate of the rate of nucleotide substitution between the most recently diverged subtypes, and consequently provide an estimate of the TMRCA. This circumvents the problems posed by earlier methods by directly estimating the pairwise rate of nucleotide substitution over the set of pairs of sequences straddling the subtype divergence without any further rate assumptions other than the existence of a mean. However, this method is only capable of determining the rate between two subtypes where a third, more distantly related, subtype functions as an outgroup. This method thus trades the ad hoc rate assumptions of the previous methods with two implicit conditions: (i) that subtypes have a unique divergence and (ii) a third, comparable subtype is available to serve as an outgroup. In exchange, we arrive at a precise statistical measure of the TMRCA that converges as the number of taxa increases and is robust to the balancing of the numbers of taxa between different subtypes. We show that applying this method to dating the divergence of influenza subtypes AH3N2 and B gives a time of divergence approximately 100 years before present, substantially more recent than previous estimates.
Methods
To calculate the rate of nucleotide substitution, we require a measurement of the number of nucleotide substitutions occurring in a given time interval. Starting from a given set of aligned sequences {s_{1}, ..., s_{n}} for n taxa, we define the pairwise distance in number of substitutions to be the estimates {K_{ij}} under a given model of nucleotide substitution. Naturally the unobservable true values {D_{ij}} of the pairwise distances differ from their estimates {K_{ij}}. To understand this difference, we associate each D_{ij }with an error ε_{ij }and assume that ε_{ij }tends to zero as sequence lengths increase without bound. We further assume that the covariance between errors, cov(ε_{ij}; ε_{mn}), is bounded and known. For time measurements, we assume that each sequence is labeled by a sampling time t_{i }given in consistent units. Since we know only the sampling time of a given sample up to the unit of time reported (day, month, year) we posit an uniform error ν_{i }~ U [0, 1] underlying each t_{i }over the unit sampling interval. To complete the error structure specification we force the two forms of error (ν_{i }and ε_{ij}) to be independent. Finally, for a set of three sequences (s_{i}, s_{j}, s_{k}) and their associated pairwise distances, we enforce a fixed topology among sequences, as shown in Figure 1, via methods outlined in [26]. We augment the topology with the observed sampling times of the three sequences, α, the divergence time between the two sequences of interest and β, the divergence time of all sequences. When necessary for clarity, we write α_{ij }to indicate the true time of divergence between sequences i and j.
Figure 1. The phylogenetic relationships between three sequences s_{i}, s_{j }and s_{k}, sampled on dates t_{i}, t_{j }and t_{k }respectively. The time of most recent common origin of s_{i }and s_{j }is α. The time of the most recent origin of all sequences is β.
Under our triplet method, we aim to estimate the true rate of nucleotide substitution, p_{ij}, between sequences s_{i }and s_{j }with an unobserved error δ_{ij}. With respect to outgroup sequence k, an unbiased estimate is
where the factor corrects for bias resulting from the time sampling error structure (see Appendix for derivation). We superscript to denote its weak dependence on outgroup sequence k. Dependence is weak as the path of evolution from t_{k }to α is shared between the paths from sequence k to both sequence i and sequence j and hence largely cancels out in Equation 1. We make this transparent in the following derivation. For brevity, we consider only unobservable true values, ignoring error terms. Let u be the location on the triplet in Figure 1 corresponding to time α and let p_{xy }be the true rate along the path connecting locations x and y. Then, as distance is rate multiplied by time, we have
Subtracting the first equation from the second equation yields D_{ik } D_{jk }= p_{ij}(t_{j } t_{i}), which is equivalent to Equation 1. This derivation makes clear that the estimator (1) measures the rate along the path from sequence i to sequence j, with only incidental dependence on sequence k.
The variance for the estimator (1) is well approximated by
Further, we can estimate the time of subtype divergence α (Figure 1) between sequences via
We note that the term t_{i }+ t_{j } 1 is used rather than t_{i }+ t_{j }to account for the expected error coming from the uniformly distributed ν_{i }and ν_{j}.
As nucleotide data increases without bound, K_{ij} → D_{ij }and → p_{ij}, ensuring that → α_{ij}. For finite sequence lengths, this relation ensures that . To gain an understanding of this estimator, we note that with a standard model of substitution (e.g. JC69, HKY85), a rate of substitution of 10^{4 }(s/s/yr) and a sequence of 2000 nucleotides, the above estimator yields a standard error of approximately 23 years [20].
The above derivations express our rate and time estimates for a single triplet of sequences. We now consider estimates that combine information across multiple representative sequences from each subtype. For discussion, we label subtypes A, B and C (which are only incidentally the same as the labels for influenza) and we assume the topology in Figure 1 for these groups. We let n_{r}, where r ∈ {A, B, C}, count the number of sequences in each group. Then when choosing triplets (s_{i}, s_{j}, s_{k}), there exist n_{A }· n_{B }· n_{C }choices, from which we form a single rate estimate that appropriately averages the set {: i ∈ A, j ∈ B, k ∈ C}: This works as all triplets have been selected to contain the divergence point between A and B. In order to make our estimate robust to outliers and noise, we employ an inverse variance weighting [12]. This standard weighting deemphasizes the contribution from estimates with high variance, providing significant protection against estimates with little information. Using this weighting, the estimate becomes
where P is the sum of the inverse variance of each estimate, .
The global divergence time estimator is a varianceweighted average over {} substituting for the rate,
where P_{α }is the sum of the inverse variance of each estimate, . Having found , we estimate its variance by a bootstrap resampling of sequences from each subtype [8].
The computational efficiency of this estimator is on the order O(n^{3}) for a tree of n taxa. This is natural as each of the initial rate estimates is composed of information concerning three taxa. While the growth of computational expense in the number of taxa may appear unpleasant, in practice this algorithm is both fast and stable, owing to the absence of costly optimization procedures for parameter inference, and is able to handle data sets of thousands of taxa. The authors detail the computational efficiency of a similar statistic in [26]. As an example, for the data presented below all computations required only a few seconds on a desktop computer.
Data and Results
We demonstrate the advantage of our triplet estimator through analysis of influenza AH3N2/B subtype divergence using the hemagglutinin (HA), matrix protein (MP) and nonstructural (NS) genes. Each analysis is performed on 60 gene sequences constructed from 20 genomes each drawn from influenza subtypes AH3N2, B and C. We download these data along with their dates of sampling from the Los Alamos Influenza Database [16]. We perform sequence alignment using ClustalX [24, version 1.8]. For consistency with previous studies of AH3N2 HA evolution, we use the HKY model of nucleotide substitution [10]. We use the TREBLE algorithm, which implements a MCA, on sets of sequences solely drawn from a single subtype to derive withinsubtype rates. The phylogenetic tree, generated by TREBLE, for the HA gene is depicted in Figure 2(a). We infer similar trees for the MP and NS genes. We calculate variances for both MCA and pairwise rate estimates using 200 bootstrap iterates. All dates are listed as years in the common era.
Figure 2. Phylogeny of 60 influenza hemagglutinin nucleotide sequences from subtypes AH3N2, B, and C. We reconstruct the phylogeny in (a) under a strict molecular clock via TREBLE [26]. The phylogeny in (b) is the same tree as in (a) with the divergence time between subtypes A and B recalibrated relaxing the molecular clock. (a) Without recalibration (b) With recalibration.
Consistent with previous studies [13], rates vary substantially both among genes and among subtypes. We record rates as a point estimate (± standard error). For the HA gene, subtype AH3N2 shows a rate of nucleotide substitution of 3.21 (± 0.43) × 10^{3}s/s/yr. This rate is slightly lower than those recorded in previous studies although within the margin of error [26]. For subtype B, the rate of nucleotide substitution is 2.31 (± 0.37) × 10^{3}s/s/yr, which is higher than previous estimates although also within the margin of error [23], and for subtype C, the rate is 0.68 (± 0.18) × 10^{3}s/s/yr. For the MP gene, rates are generally lower than those for HA. The subtype AH3N2 rate is 1.57 (± 0.38) × 10^{4}s/s/yr. The subtype B rate is 2.20 (± 0.48) × 10^{3}s/s/yr and the subtype C rate is 1.31 (± 0.33) × 10^{3}s/s/yr. Lastly, for the NS gene, the rates are similar to those of the MP gene. The subtype AH3N2 rate is 2.14 (± 0.25) × 10^{3}s/s/yr, the subtype B rate is 1.92 (± 0.20) × 10^{3}s/s/yr, and the subtype C rate is 1.68 (± 0.51) × 10^{3}s/s/yr. Table 1 presents these results. Figure 3 provides histograms of the bootstrap distributions for all three genes and subtypes.
Table 1. Withinsubtypes rates of nucleotide substitution for hemagglutinin (HA), matrix (MP) and nonstructural (NS) genes for subtypes AH3N2, B and C.
Figure 3. Histograms of the time of most recent common ancestor for subtypes AH3N2, B and C, respectively, derived from molecular clock estimates on hemagglutinin (HA), matrix (MP) and nonstructural (NS) gene sequences. (a) Subtype AH3N2 (b) Subtype B (c) Subtype C.
Assuming a molecular clock within a subtype and with the rates above, we generated the corresponding dates of the TMCRA. Figure 3 shows histograms of the TMRCA estimates for different genes and subtypes. All genes are similar in dating the TMRCA for AH3N2 to approximately 1965 (1964, 1965, and 1962 for HA, MP and NS genes, respectively). These dates are consistent with the emergence of the AH3N2 subtype into global circulation during the 1968 pandemic [1]. Both the MP and NS genes date the TMRCA of subtype B to 1943, while the HA rate places the TMRCA at 1953. This latter value is inconsistent with the influenza B subepidemics of 1950–51 but is consistent with the emergence of the more lethal Victoria strain of influenza B in 1953 [11]. Each of these estimates has a standard error of approximately 2 years and so these discrepancies may be accounted by measurement uncertainty. The 10 year gap between the TMCRA suggested by the different genes can be explained by a reassortment event. Finally, the TMRCA of subtype C is calculated as 1952 and 1953 by the MP and NS genes, respectively, while the HA gene places the TMCRA at 1906. This nearly half century discrepancy suggests that the subtype C HA gene experienced a markedly different evolutionary history than either the MP or the NS gene. A biologically plausible explanation would be a reassortment event. Another possible explanation is that nonMCA rate behavior has lead to substantial bias in dating the TMRCA.
We now compare the results from pairwise rate estimates across subtypes AH3N2 and B with those from application of the MCA to the same data. These results are summarized in Table 2 and Figure 4. Using the triplet method developed above, data from the hemagglutinin gene yields a pairwise rate of substitution between subtypes AH3N2 and B, , of 8.66 (± 0.26) × 10^{3 }s/s/yr. Via Equation 3, and averaging over all possible pairs of sequences (s_{i}, s_{j}) ∈ {AH3N2, B}, the date of divergence between the two subtypes is then 1905 (± 20) years. Under a molecular clock, the substitution rate for HA over both subtypes AH3N2 and B is 3.10 (± 0.37) × 10^{3 }s/s/yr, implying a TMCRA at 1789 (± 12.5). A similar pattern emerges for the MP gene. The pairwise rate of substitution is 6.46 (± 1.31) × 10^{3 }s/s/yr, yielding a TMRCA at 1912 (± 18) years. The MCA rate of substitution is 2.13 (± 0.35) × 10^{3 }s/s/yr with a corresponding TMRCA of 1759 (± 15). Finally, for the NS gene, the pairwise rate of substitution is 7.95 (± 0.25) × 10^{3 }s/s/yr, leading to the TMRCA as 1902 (± 19) years. Under the MCA, the rate of substitution is 2.22 (± 0.38) × 10^{3 }s/s/yr with a corresponding TMCRA of 1777 (± 14). Summarizing these results, we find that the pairwise rate estimates are consistent in placing the TMRCA at approximately 1905 while the MCA rate estimates correspond to a TMRCA at approximately 1775. This discrepancy between the two sets of estimates of the TMRCA likely owes to the inability of the MCA to integrate information from the period of evolution between the two subtypes, leading to a substantial underestimate of the rate of substitution, and consequent underestimation of the date of the TMRCA.
Table 2. Acrosssubtype rates of nucleotide substitution between subtypes AH3N2 and B for hemagglutinin (HA), matrix (MP) and nonstructural (NS) genes.
Figure 4. Histograms of the time of most recent common ancestor of subtypes AH3N2 and B, derived from molecular clock estimates (light grey) and pairwise estimates (dark grey) on hemagglutinin (HA), matrix (MP) and nonstructural (NS) gene sequences.
Discussion
We present a new method for ascertaining the rate of nucleotide substitution between subtypes and apply this method together with traditional MCA methods to date the divergence of influenza subtypes AH3N2, B, and C. We use three genes, HA, MP and NS, to date two types of divergence events: the time of the most recent common of each subtype and the time of divergence between two subtypes, AH3N2 and B. For the former event type, we show that the three genes are loosely consistent in their dating of the TMRCA of the subtypes, with the notable exception of the HAderived estimate of subtype C's TMRCA approximately 50 years before the MP and NSderived estimates. This discrepancy may indicate either that subtype C's hemagglutininesterase gene engaged in a biologically significant event, such as reassortment, or that MCA estimation does not adequately model the evolution of the gene.
For the divergence between subtypes AH3N2 and B, previous studies using the MCA generally place a time of divergence of several hundred years ago, ranging from the 16th to early 19th centuries. Other analysis have yielded estimates of 3600 years ago [23]. In the current study, application of the MCA yielded estimates in the last half of the 18th century. However, applying the pairwise rate estimate developed above we find uniformly, across genes, that the divergence likely occurred in the very early 20th century. The discrepancy between these two measures is likely due to the increased modeling flexibility of the pairwise rate estimate relative to the MCA.
This discrepancy between the rates and corresponding TMCRA estimates has important biological consequence. The phylogenetic divergence between subtype AH3N2 and B corresponds to a subspeciation event for the virus. The results in this study indicate that the process of speciation is not neutral but instead a period of rapid and intense genetic change. The three genes studied here consistently show large acceleration in the rate of nucleotide substitution for the divergence period relative to the rates observed within a stable subtype. This study gives strong evidence that, at least for influenza viral subtype divergence, the process of subspeciation is associated not just with large genomic changes but also with an accelerated, finite process of adaption.
Assuming that the more recent estimate is correct, a subsequent question is whether or not a pandemic or epidemic associates with subtype AH3N2/B divergence. In the twentieth century, all influenza pandemics associate with the emergence or reemergence of subtypes (AH1N1 in 1918, AH2N2 in 1957 and AH3N2 in 1968). Serological analysis indicates that the 1897 pandemic was likely due to subtype AH2N2. However, the pandemic of 1900 is of uncertain type, although it is commonly reported in the literature as being due to AH3N2 [4]. The above analysis suggests that it is possible to postulate that the cause of this pandemic is due to the emergence of subtype AH3N2 or B.
As noted above, we condition the results presented here on a specific sequence alignment. As the question under consideration concerns the divergence of specific genes and proteins over a (presumably) long time scale, the capacity to generate reasonable alignments diminishes with increasing time of divergence between types, conditional on the rate of substitution. We find that for the hemagglutinin gene, a proportion of sequence alignments support the split of subtype B from subtype C after the split between subtypes AH3N2 and B, in opposition to the topology enforced in our analysis. Hence, to some unknown degree, our analysis is necessarily biased by the choice of alignment. This suggests that improved dating can be found by integrating estimation procedures over an ensemble of alignments [19].
The pairwise estimate method presented above is accurate in the scale
where is the total time over the phylogeny and p is mean rate over the phylogeny [26]. This relation dictates that as divergence events become more remote the ability of the triplet method to resolve the time of divergence diminishes. While this limit prohibits the calculation of remote divergence events, the example presented above lies within the appropriate scale.
In place of a specific MCA, the estimates presented here directly calculate the rate of substitutions between taxa from different viral subtypes. As such estimates span paths between subtypes, they simultaneously capture the rate evolution along branches both within and between subtypes. From these estimates, we are able to directly infer the time of divergence between subtypes. As a tradeoff for limited MCAs, the method requires an outgroup subtype to function as an origin relative to the subtypes under consideration. We feel that the triplet method provides a simple and widely applicable way to calculate the dates of divergence of rapidly evolving organisms without the pitfalls of the MCA.
Conclusion
We present a simple method for calculating the time of viral subtype divergence that does not assume a molecular clock over the entire phylogeny. Additionally, the estimator of this method, a weighted sum of pairwise estimates, furnishes a defined variance for the time of the most common ancestor between subtypes. As a tradeoff for this increased precision, the structure of the triplet statistic requires an outgroup set of sequences, usually a closely related subtype. We apply this estimator to the case of influenza subtype divergence, considering three genes. We show that the estimated divergence time of subtypes AH3N2 and B is more than a century later than those calculated with a molecular clock.
Authors' contributions
JDO'B collected the data, designed and performed the study and wrote the initial manuscript. ZSS provided extensive review of the study design and provided assistance in revising the manuscript. MAS contributed extensive work in reviewing and revising the manuscript.
Appendix
Initially, one might define an estimator of the conditional pairwise rate to be
that has been previously used in the paper outlining the TREBLE algorithm [26], and originates in [13]. However, this apparently natural statistic is substantially biased when the sampling times of sequences i and j are close. To be seen in the following derivation, this bias is the result of the time sampling error structure.
As the true value of the rate of substitution is given by
we then have an expression for the error:
Taking the expectation yields the bias:
Since we assume that the ν and ε structures are independent, the right side of the equation can be further reduced, yielding
Let Δt = t_{i } t_{j}. The final expectation on the right hand side resolves by direct integration,
We note that as the sampling time is independent of the rate of nucleotide substitution, the error increases in proportion to the magnitude of the initial statistic. We can then create a new, unbiased statistic by counterbalancing the original statistic with this factor, making a new statistic
Acknowledgements
JD O'Brien was supported by the NIGMS Systems and Integrative Biology Training Grant for the duration of this work. MA Suchard is supported by an Alfred P. Sloan Research Fellowship in Computational and Evolutionary Molecular Biology and a John Simon Guggenheim Memorial Fellowship.
References

Brown EG: Influenza virus genetics.
Biomedical Pharmacotherapy 2000, 54:196209. PubMed Abstract  Publisher Full Text

Buonagurio DA, Nakada S, Fitch WM, Palese P: Epidemiology of influenza C virus in man: multiple evolutionary lineages and low rate of change.
Virology 1986, 153(1):1221. PubMed Abstract  Publisher Full Text

Chen R, Holmes EC: Avian influenza virus exhibits rapid evolutionary dynamics.
Moleclar Biology and Evolution 2006, 23(12):23362341. PubMed Abstract  Publisher Full Text

Dowdle WR: Influenza pandemic periodicity, virus recycling, and the art of risk assessment.
Emerging Infectious Diseases 2006. PubMed Abstract  Publisher Full Text

Drummond A, Ho SY, Phillips MJ, Rambaut A: Relaxed phylogenetics and dating with confidence.
Public Library of Science Biology 2006., 4(5) PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Drummond A, Nicholls GK, Rodrigo AG, Solomon W: Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data.
Genetics 2002, 161:13071320. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Drummond A, Pybus OG, Rambaut A, Forsberg R, Rodrigo AG: Measurably evolving populations.
Trends in Ecology and Evolution 2003, 18(9):481488. Publisher Full Text

Efron B, Tibshirani RJ: Introduction to the Bootstrap. CRC Press, New York; 1993.

Ferguson NM, Galvani AP, Bush RM: Ecological and immunologial determinants of influenza evolution.
Nature 2003, 422(6930):428433. PubMed Abstract  Publisher Full Text

Hasegawa M, Kishino H, Yano TA: Dating the humanape splitting by a molecular clock of mitochondrial DNA.
Journal of Molecular Evolution 1985, 22(2):160174. PubMed Abstract  Publisher Full Text

Hennessy AV, Minuse E, Davenport FM: A twentyoneyear experience with anitgenic variation among influenza B viruses.
Journal of Immunology 1965, 94(2):301306. PubMed Abstract  Publisher Full Text

Huber PJ: Robust statistics: A review (1972 Wald lecture).
Annals of Mathematical Statistics 1972, 43(4):10411067. Publisher Full Text

Kashyap R, Subas S: Statistical estimation of parameters in a phylogenetic tree using a dynamics model of the substitutional process.
Journal of Theoretical Biology 1974, 47(1):75101. PubMed Abstract  Publisher Full Text

Lemey P, Pybus O, Wang B, Saksena NK, Salemi M, Vandamme AM: Tracing the origin and history of the HIV2 epidemic.
Proceeding of the National Academy of Sciences 2003, 100(11):65886592. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Lu H, Zhao Y, Zhang J, Wang Y, Li W, Zhu X, Sun S, Xu J, Ling L, Cai L, Bu D, Chen R: Date of origin of the SARS coronavirus strains.
BMC Infectious Diseases 2004, 4:3. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Macken C, Lu H, Goodman J, Boykin L: The value of a database in surveillance and vaccine selection. In Options for the Control of Influenza IV. Edited by Osterhaus A, Cox N, Hampson A. Elsevier Science, Amsterdam, NL; 2001:103106.

Mills CE, Robins JM, Lipsitch M: Transmissibility of 1918 pandemic influenza.
Nature 2004, 432(7019):904906. PubMed Abstract  Publisher Full Text

Rambaut A: Estimating the rate of molecular evolution: Incorporating noncontemporaneous sequences into maximum likelihood phylogenies.
Bioinformatics 2000, 16(4):395399. PubMed Abstract  Publisher Full Text

Redelings B, Suchard MA: Joint Bayesian estimation of alignment and phylogeny.
Systematic Biology 2005, 54(3):401418. PubMed Abstract  Publisher Full Text

Rzhetsky A, Nei M: Tests of applicability of several substitution models for DNA sequence data.
Molecular Biology and Evolution 1995, 12(1):131151. PubMed Abstract  Publisher Full Text

Sanderson MJ: r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock.
Bioinformatics 2003, 19(2):301302. PubMed Abstract  Publisher Full Text

Seo TK, Thorne JL, Hasegawa M, Kishino H: A viral sampling design for testing the molecular clock and for estimating evolutionary rates and divergence times.
Molecular Biology and Evolution 2002, 18(1):115123. PubMed Abstract  Publisher Full Text

Suzuki Y, Nei M: Origin and evolution of influenza hemagglutinin genes.
Molecular Biology and Evolution 2002, 19(2):501509. PubMed Abstract  Publisher Full Text

Thompson JD, Gibson TJ, Plewniak F, Jeanmourgin F, Higgins DG: The Clustal X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools.
Nucleic Acids Research 1997, 25(24):48764882. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Twiddy SS, Holmes EC, Rambaut A: Inferring the rate and timescale of Dengue virus evolution.
Molecular Biology and Evolution 2001, 20(1):122129. PubMed Abstract  Publisher Full Text

Yang Z, O'Brien JD, Zheng XB, Zhu HQ, She ZS: Tree and rate estimation by local evaluation of heterochronous data.
Bioinformatics 2007, 23(2):169176. PubMed Abstract  Publisher Full Text