Abstract
Background
Many data summary statistics have been developed to detect departures from neutral expectations of evolutionary models. However questions about the neutrality of the evolution of genetic loci within natural populations remain difficult to assess. One critical cause of this difficulty is that most methods for testing neutrality make simplifying assumptions simultaneously about the mutational model and the population size model. Consequentially, rejecting the null hypothesis of neutrality under these methods could result from violations of either or both assumptions, making interpretation troublesome.
Results
Here we harness posterior predictive simulation to exploit summary statistics of both the data and model parameters to test the goodnessoffit of standard models of evolution. We apply the method to test the selective neutrality of molecular evolution in nonrecombining gene genealogies and we demonstrate the utility of our method on four real data sets, identifying significant departures of neutrality in human influenza A virus, even after controlling for variation in population size.
Conclusion
Importantly, by employing a full modelbased Bayesian analysis, our method separates the effects of demography from the effects of selection. The method also allows multiple summary statistics to be used in concert, thus potentially increasing sensitivity. Furthermore, our method remains useful in situations where analytical expectations and variances of summary statistics are not available. This aspect has great potential for the analysis of temporally spaced data, an expanding area previously ignored for limited availability of theory and methods.
Background
The field of population genetics has a long history in the development of tests of selective neutrality. This is both because of the difficulty of developing a tractable alternative to the neutral theory and because of the ongoing debate about how well the neutral theory can explain real data. Although a number of important steps have been made to develop powerful tests of neutrality [13] there are evident problems with many currently available tests. For example many of the tests, such as Tajima's D (D_{T}) and Fu and Li's D (D_{F}) have difficulty in accurately discriminating between selection and changes in population size.
In fact, most available tests of neutrality can only test constant population size neutrality against alternatives that include both population growth and selection. Furthermore, most tests require accurate knowledge of the number of mutations that have occurred or the branch lengths in the gene tree, and do not adequately take into account the uncertainty in these quantities (i.e. most tests implicitly assume an infinitesites model of evolution). Finally, treebased summary statistics are often based on one estimate of the genealogy, despite the fact that the true genealogy and branch lengths are seldom known.
Broadly speaking, on the basis of the sequence information used, statistics for testing neutrality can be placed into three classes:
1. statistics that use the mutation (segregating site) frequency spectrum [1,2,4,5],
2. statistics that use the haplotype distribution [3,6,7] and
3. statistics that use the pairwise distance (mismatch) distribution [8,9].
A recent comprehensive survey of the power of these different classes of tests for detecting population expansion found that classes 1 and 2 were generally more powerful than the best class 3 statistics [10]. Some of the bestknown test statistics come from class 1 and essentially work by comparing aspects of the mutation frequency spectrum with neutral expectations. This class of test statistics include D_{T }[1], D_{F }[2] and the H statistic [5]. In the simplest case, these statistics can be used to measure deviations from the null hypothesis of constant population size, random mating and no recombination. For example D_{F }measures the normalized difference between the number of mutations on the external branches and the total number of mutations in the genealogy. Under the null hypothesis of neutral evolution the expectation of D_{F }is zero, and a significant departure from zero signifies selection (balancing, directional, negative), recombination or changes in population size. The last of these alternatives is problematic because exponential growth is expected to give results similar to directional or purifying selection. For this reason it would seem desirable to develop a method that directly accounts for alternative demographic models of population size through time. In this context, several studies have combined the use of summary statistics and demographic models [1115].
Apart from biasing the mutation frequency distribution, selection may also affect the shape of the gene tree [16]. Although few attempts have been made to use this expectation in a rigorous test of neutrality (c.f. [17]), a number of branching models and summary statistics measuring tree imbalance exist in the literature of speciation models [1821]. A method that could use information both from the mutation frequency spectrum and from the shape of the gene tree may be more powerful than either used individually.
If all sequences comprising a gene tree are sampled from the same time point (as is required by most tests of neutrality) then there is very little power to distinguish between selection and exponential growth. However if sequence data is available from different times, during which measurable evolution has taken place, as in RNA viruses and ancient mitochondrial DNA (mtDNA) data [22,23] then the power to distinguish between these two alternatives is potentially much greater. Unfortunately, the expectations and variances of crucial quantities (such as tree length) are not yet available for serially sampled data, so this potential power has not been tapped.
Apart from analyses of intrapopulation sequence variation, evidence for nonneutrality can also be detected by comparing within and betweenspecies sequence variation [24]. For example, it has been widely observed that in some species there is an excess number of polymorphic nonsynonymous sites segregating within the species relative to the number of nonsynonymous sites with fixed differences between closely related species [19,25]. This effect is consistent with the conclusion that a substantial fraction of nonsynonymous mutations are slightly deleterious mutations (SDMs) that often persist as polymorphisms within populations for some time but have a low probability of eventual fixation [26]. However this pattern is not universal. In fact, at least in Drosophila the pattern appears to be the reverse [27], possibly implying a prominent role for recurring positive selection [28]. Regardless of the direction of nonneutral evolution this test may suggest, it has been shown that, as with summary statistics of the mutation frequency spectrum, the accuracy of these methods is compromised by the effects of unrecognized historical demographic change [29]. Both withinspecies and betweenspecies methods rely on the fact that SDMs become increasingly rare relative to neutral mutations at higher frequencies. For example, within a panmictic population, the distribution of SDMs is expected to predominate near the tips of a population genealogy [30], so that SDMs are on average younger than neutral mutations [25]. Thus the older branches (and associated mutations) within a population will tend to consist of relatively fewer SDMs (as purifying selection has had longer to act).
Although a number of researchers have observed nonneutral behaviour of nonsynonymous polymorphism in proteincoding regions, few have considered the effect of SDMs on linked genetic variation in noncoding regions. This is particularly pertinent to the study of the control region of mitochondrial genes, which is extensively used for withinpopulation genetic sampling of animal mtDNA [31]. The action of HillRobertson interference is expected to exacerbate the persistence of SDMs in populations [32], because it reduces the efficiency of purifying selection. Even moderately deleterious mutations, which would otherwise be removed by selection very quickly, can persist in the population if there is substantial genetic linkage between sites [30]. Therefore, in nonrecombining genetic elements such as the mitochondrial genome and the genomes of negatively stranded RNA viruses, mutations that are themselves selectively neutral will nevertheless tend to share the fate of linked deleterious mutations.
In this paper we extend an existing Bayesian method originally applied to investigating nonneutrality in HIV evolution [33], that can be used to test for selective neutrality in both coding and noncoding genetic regions sampled from within a single population. The method assumes no knowledge of ancestral mutation frequencies and takes into account the confounding effects of demographic history. We demonstrate the utility of this method on four examples comprised of one noncoding data set and three coding data sets. This method assumes a single genealogy describes the evolutionary history of the sequences under study, but makes no assumptions about ancestral mutation frequencies and takes into account the confounding effects of demographic history. We demonstrate the utility of this method on four nonrecombinant examples comprised of one noncoding data set and three coding data sets.
Results and discussion
We employed a suite of summary statistics to test the assumption of neutrality on four example data sets. Because selection is expected to change both the distribution of mutations on the tree and the shape of the sample genealogy [30], statistics that measure both of these departures were included in the analysis.
Summary statistics
Fu and Li [2] compared two estimates of population parameter θ that can be derived for a sample of n sequences:
1. the total number of singleton polymorphisms and
2. the total number of segregating sites divided by .
Under neutrality the difference between these two measures is expected to be zero, and the variance in the difference can be calculated. The resulting normalized test statistic D_{F }assumes an infinite sites model of mutation, because it equates mutations with branch lengths in the underlying coalescent tree and does not therefore account for the possibility of multiple mutations at a single site. To avoid this assumption we employ a genealogybased version of D_{F}, which compares the length of terminal branches to the total length of the coalescent genealogy (we term this the genealogical D_{F}). In addition to the genealogical D_{F}, two other measures of branch length distribution (age of most recent common ancestor, and total tree length; see Table 1) and three measures of tree imbalance B_{1}, I_{c }and C_{n }were also employed.
Table 1. Summary statistics used in test of neutrality
The B_{1 }statistic is the maximum number of nodes between an internal node and the tips of the tree, summed over all internal nodes and excluding the root [34]. Higher values of B_{1 }are expected with increasing symmetry of the phylogeny. Colless's tree imbalance index I_{c }considers each internal node of a bifurcating tree and partitions the number of terminal sequences that descend from it into two groups, r and s, where r ≥ s. Symmetry is measured based on the difference between r and s, summed over all internal nodes [18]. The measure increases from 0 for a perfectly symmetrical tree, to 1 if the tree is completely asymmetric. The final treeasymmetry measure, Cherry count C_{n}, is simply the number of pairs of sequences joined by their most recent common ancestor [20]. More symmetrical trees are identified by higher values of C_{n}. All six summary statistics used are listed in Table 1.
Data analysis
Brown bear mitochondrial DNA
An alignment of noncoding mitochondrial DNA from the dloop of brown bears Ursus ursus was compiled as an example of noncoding molecular sequence data that is assumed to be evolving neutrally. The data set comprised 30 previously published ancient DNA sequences [35], along with 44 modern brown bear sequences obtained from GenBank. The software BEAST [36] was used to conduct Bayesian MCMC analysis on the full data set (n = 74), yielding estimates of evolutionary rate, population size and ancestral genealogy (Table 2). The substitution model chosen allowed for different rates of transitions and transversions [37] as well as Γdistributed rate heterogeneity among sites [38]. Both constantsize and exponentialgrowth models of demography were investigated. To test if the assumption of neutrality was warranted, posterior and posterior predictive values were calculated for each of the summary statistics in Table 1, along with their corresponding multivariate posterior predictive pvalue. A Bayes factor computed via importance sampling [39,40] was used as a model choice criterion to compare the relative marginal likelihoods of the two models, resulting in rejection of the exponential growth model in favour of a constantsize population. However under both models, differences between the posterior and posterior predictive values did not suggest any significant departures from neutrality in any of the six summary statistics investigated. The multivariate pvalues for constant and exponential growth were 0.219 and 0.284 respectively. This result suggests, at least in terms of tree asymmetry and branch length distribution, that selective neutrality cannot be rejected for the dloop of brown bears.
Table 2. Bayesian parameter estimates
RNA virus data sets
Three RNA virus data sets were also analyzed under the same model conditions as described above. The first was a multiple sequence alignment (n = 129) of the g gene (L = 629 bp) of human respiratory syncytial virus (HRSV) spanning 46 years from 1956 to 2002 [41]. This virus was used as an example of a coding gene of an RNA virus that exhibits only a weak signal of nonneutrality in terms of its tree shape. The estimates of mutation rate and population size are shown in Table 2. A constant population size was preferred over exponential growth using a Bayes factor. The multivariate posterior predictive pvalues did not reject neutrality (p = 0.33). We followed up with a series of univariate analyses using the individual summary statistics. The tree length T, age of the root t_{MRCA }and D_{F }statistics are all close to significance under the assumption of constant population size, as shown in Table 3, while the remaining univariate statistics are less suggestive. Therefore, there is only marginal evidence for low levels of nonneutrality in the tree shape of HRSV.
Table 3. Predictive Probabilities
To demonstrate the ability of this method to detect nonneutrality, two additional data sets were analyzed. The first was a previously published data set of the E gene of the dengue4 virus (n = 69, L = 1485) from Puerto Rico [42] spanning 17 years. The second was a data set of hemagglutinin sequences from human influenza A virus selected to have a similar time frame (1981–1998). These two viral data sets are both expected to exhibit the effects of adaptive selection, particularly influenza A virus, given the nature of their life histories [4244]. As for the previous data sets, posterior and posterior predictive values were calculated for each of the summary statistics in Table 1. Under constant population size, the multivariate posterior predictive pvalue = 0.0269 for the dengue4 virus data set and = 0.0240 for the influenza A virus data set. Both of these data sets exhibited significantly more negative D_{F }then expected under neutrality, suggesting that the relative length of the terminal branches is larger than expected in both data sets. Additionally, the human influenza A data set also had marginally more treeimbalance than expected under neutrality, and for the dengue4 data set, the age of the most recent common ancestor was significantly smaller than expected under neutrality. Figure 1 shows the posterior and predictive distributions for D_{F }and tree length T for all four data sets. Figure 2 shows (A) two human influenza A virus trees from the posterior distribution of the exponential growth analysis along with (B) the corresponding simulated trees in the predictive distribution. These observations provide qualitative evidence for the ability to detect nonneutral evolutionary dynamics from tree shape as suggested in a recent review of the nascent field of phylodynamics [16].
Figure 1. Posterior and predictive distributions of tree length T and D_{F}. Posterior and predictive distributions of tree length T and D_{F }for all four data sets. The dengue4 data is from an analysis assuming exponential growth, while the other three analyses assumed a constant population size. Human influenza A virus shows the largest departure from neutrality, with the posterior distribution completely disjoint from the predictive distribution.
Figure 2. Posterior and posterior predictive genealogies of human influenza A virus. (A) A sample of two trees from the posterior distribution of the human influenza A virus data set. (B) The two matching trees simulated for the predictive distribution of the human influenza A virus data set. Obvious differences between the posterior and predictive trees are the shorter tree length and absence of deep splits in the posterior trees.
Distinguishing selection from exponential growth
A criticism often leveled at tests of neutrality such as D_{F}, is that significantly negative values of D_{F }could signify exponential growth rather than nonneutral evolution. As demonstrated above the methodology employed here allows the demographic history to be described parametrically as part of the model. Therefore, inference and testing can both be achieved under a model of exponential growth. In this case, any additional departure from expectations cannot be attributed to exponential growth as the demographic signal is incorporated into the test via the predictive distribution. The results of the tests including exponential growth are also presented in Table 3. Interestingly, model selection by Bayes factors can not strongly reject constant populations in all of the data sets except for dengue4. In the case of dengue4, the log Bayes factor in favor of the exponential population model is approximately 8. However, the multivariate pvalue for dengue4 is no longer significant once exponential growth is incorporated. We can therefore distinguish between selection and growth in the dengue4 and influenza data sets. In dengue4, the departure from neutral expectations can be explained by an incorrect choice of demographic functions. Whereas in influenza, significant departures from neutral expectations are observed under both demographic scenarios. In contrast, there is little evidence of nonneutrality in the bear and HRSV data sets.
Simulations
For infinitely long sequences, for which no uncertainty in the underlying genealogy exists, p_{B }behaves like a classical pvalue. In the infinite data situation, the posterior distribution of T(·) collapses to a single point and equation (5) then returns the probability of observing a test statistic under the null hypothesis of selective neutrality as extreme as the test statistic of the data. In finite data situations, p_{B }is stochastically less variable than a uniform distribution but with the same mean. This implies that the distribution of p_{B }is more centered about 1/2 than a uniform random variable, leading to slightly more conservative tests when one choses small Type I error rates. To test the assertion that p_{B }can still be interpreted as a pvalue even when sequences are of short length and there is significant uncertainty in the underlying genealogy, a simulation study was undertaken. A number of replicate data sets (n = 100) were simulated and analyzed as follows:
1. A timestructured coalescent tree was simulated with sample times at 0, 300, 600, and 900 days, with 10 sequences at each time and a constant populationsize parameter N_{e}τ = 1500 (the product of N_{e }and generation length in days).
2. DNA sequences of length 400 were simulated down the coalescent tree under an HKY85 + Γ model of substitution with parameters κ = 8, σ = 0.1, and μ = 4.0 × 10^{5 }per site per day. Insertions and deletions were not simulated.
3. A Bayesian MCMC analysis was run on the resulting DNA sequence alignments using BEAST (Drummond and Rambaut 2004), assuming a constant population and an HKY85 + Γ model of substitution. The demographic and substitution parameters were all estimated assuming flat priors with conservative upper bounds.
4. For each state p of the MCMC, the sampled population size parameter θ^{(p) }~ P(⋯Y) was used to generate a timestructure coalescent tree G^{rep,(p)}. The set of trees for p = 1, ..., P is the predictive distribution of genealogies.
5. Using equation (5), the posterior distribution of genealogies was compared with the predictive distribution of genealogies, resulting in a p_{B }value (using the D_{F }statistic to summary the genealogies, as D_{F }proved most powerful on the real data sets).
In the above scheme, the model used to simulate the data is the same as the model that we are testing against. Therefore we would expect the p_{B }values to be distributed approximately uniformly between 0 and 1 under the null hypothesis. Figure 3 shows the cumulative probability distribution for the p_{B }statistics calculated using the above scheme as well as for a 100 replicates where exponential growth (N_{e}τ = 5000, r = 2 × 10^{3}) was assumed instead of a constant population size. For the case of constant population size, the p_{B }values are distributed approximately uniformly, with 8 false positives (i.e. p_{B }< 0.05) suggesting that the test is neither overly sensitive nor too conservative (Figure 3). However for the case of exponential growth the values of p_{B }do not appear uniform with too few extreme values suggesting that the test would be conservative. These results strengthen the conclusion of a relative abundance of externalbranch mutations in the viral data sets analyzed in this paper, because the significant statistics observed for real data sets both under constantsize and exponential growth assumptions would not be expected under neutrality.
Figure 3. Cumulative distribution of p_{B}. (Cumulative distribution of p_{B }values (based on D_{F }statistic) on 100 simulated data sets under a constant population (open circles) and an exponentially growing population (closed circles). The ideal behaviour for p_{B }when applied to data simulated from the null distribution would be a uniform distribution (see main text for details). This plot shows that if the true demographic history is a constant population, then p_{B }will be a good test of neutrality. However, if the true demographic history is exponential growth p_{B }will be a conservative test, as can be seen by the lack of high p_{B }values in the closed circles.
Conclusion
The results presented here demonstrate the utility of posterior predictive simulation for testing the goodnessoffit of population genetic models of molecular evolution. In particular we tested the assumption of neutrality under both constant population size and exponential growth on four example data sets where temporally spaced data was available. In both dengue4 and the human influenza A viruses there was a significant excess of mutations on terminal branches whether or not exponential growth was assumed. In contrast gene trees of HRSV and the dloop of brown bears did not exhibit any significant departure from neutral expectations in terms of tree shape or genealogical distribution of mutations, although all four data sets had greater than average numbers of mutations on terminal branches relative to internal branches when compared to neutral expectations. Furthermore all four data sets had below average age of the root and below average tree length. In terms of treeimbalance, both above and below average imbalances were observed for all three treeimbalance statistics measured (B_{1}, C_{n}, I_{c}).
This paper has been primarily concerned with demonstrating the utility of using existing summary statistics for testing neutrality in temporally spaced data sets. While we have demonstrated that existing statistics, such as D_{F }can be successfully used to uncover nonneutral evolution it remains likely that better summary statistics may exist. We have described a method for comparing measures of tree shape with their expectations even if the tree shape statistic cannot be directly calculated from the sequence data. We hope that further development of test statistics of tree shape explicitly designed for temporally spaced data will proceed. By doing this we hope tests of recent phylodynamic theories [16] of genetic diversity and evolution in viral pathogens can be constructed. With the posterior predictive framework outlined here, new statistics should greatly increase our ability to detect nonneutral evolution and other departures from standard models of molecular evolution and population genetics. One potentially fruitful direction lies in examining violations of neutrality in the underlying substitution process, as well as in treeshape. Efficient methods to detect substitution model violations by comparing the expected numbers of different classes of nucleotide substitutions have already been introduced [45]. This allows future work to combine appropriate summary statistics across the full model parameter space in order to maximize statistical power to detect nonneutral evolution.
Our reliance on posterior predictive simulation may raise the concern that the observed data Y for each example is "used" twice, first in generating the posterior distribution of model parameters and then in estimating the teststatistic employed to reject the null hypothesis. An alternative approach utilizing prior predictive simulation exists [46] and satisfies the above criticism. However, prior predictive simulation is undefined under improper prior distributions [47] and may not offer sufficient statistical power when vague priors are employed [48,49], such is the case in this work. In general, if the model parameters G, Ω and θ are wellestimated given Y, then the posterior predictive pvalue yields results similar to classical pvalues (when available), while the prior predictive assessment is highly sensitive to prior distribution choice [47]. In addition to these fully Bayesian predictive methods, Bayes factors [50] are effective model selection tools in phylogenetics [51]. A Bayes factor measures the relative likelihood of two competing models. To compute the Bayes factor in favor of the null model of neutrality, one must specify an alternative model. Unless the researcher has firm a priori knowledge about how neutrality might be violated in their data, we recommend starting with rejecting the null through these predictive methods and only then attempting the difficult task of nonneutral model construction and fitting.
Although both dengue4 and the human influenza A viruses exhibit very ladderlike trees that are highly imbalanced, our analysis suggests that this amount of imbalance is not much more than would be expected given the sampling scheme and estimated effective population sizes. However it can be argued that small effective population sizes, by themselves, are evidence for selection. This is because effective population size (N_{e}) is a measure of the number of productively replicating individuals, only when the population is evolving under conditions of neutrality. In the absence of any such prior assumptions, N_{e }should be considered only as a surrogate measure of diversity in the population. Because diversity is reduced by selection, a low estimated N_{e }could be a sign that the population process is being driven by natural selection. Nevertheless, the results presented here emphasize that ladderlike trees, by themselves, do not necessarily suggest selection. Consequently, interpretation of tree shape imbalance should not be made in the absence of an understanding of the expectations under the null model. Overall, for the example data sets chosen in this study, tree shape did not seem to be a powerful indicator of nonneutral evolution. Finally, by incorporating a demographic model into the test framework, we have ruled out exponential growth as the reason for significant predictive probabilities (p_{B}) in all data sets besides dengue4. Nevertheless there remain a number of alternative explanations for neutrality being rejected.
Both human influenza A and dengue4 viruses show a significant excess of mutations on terminal branches when compared to the predictions of the best fitting parameters of the neutral model. These departures from neutrality lend insight into the process of molecular evolution in RNA viruses, and suggest that new models that take into account these departures need to be developed to accurately model their genetic variation. In contrast, at least with respect to tree shape and genealogical distribution of mutations, neutrality seems to be an approximately adequate model for the G gene of HRSV and the dloop of brown bears. We hope that further application of posterior predictive simulation will shed light on the pattern of withinpopulation genetic variation in a wide range of species and genetic elements.
Methods
To assess selective neutrality in evolution, traditional test statistics summarize either the observed sequence data Y directly or the shape and internode distribution of a fixed gene genealogy G relating the sequences, where G is assumed known. In general, however, G is unknown a priori and must also be inferred from the sequence data with considerable uncertainty for measurably evolving populations [23]. This presents a difficulty for classical statistical tests. We overcome this shortfall in a Bayesian framework using posterior predictive assessment of model fit [33,47]. In this framework, we estimate G and its associated uncertainty from Y using a statistical model of molecular evolution and population demography and simultaneously compare a summary statistic of the random genealogy G to the statistic's expectation under neutrality. Our approach relies on assuming a statistical model for molecular evolution under neutrality. We employ a standard choice based on a continuoustime Markov chain process for nucleotide substitution [52] and an underlying coalescent process to generate the genealogy [53]. In particular, we assume the [37] (HKY85) substitution model with discrete – distributed rate heterogeneity across sites [38] parameterized by Ω = (μ, κ, σ). Parameter μ is the overall rate of mutation, κ is the transition/transversion bias and σ is the Gamma shape parameter. We assume a demographic coalescent process that allows for exponential population growth parameterized by θ = (N_{e}τ, r). Parameter N_{e}τ is the product of the effective population size and generation time and r is the exponential growth rate. Restricting r = 0 results in a constant populationsize model. After assuming a prior distribution over (Ω, θ), we can approximate the posterior distribution
using Markov chain Monte Carlo (MCMC) techniques [54,55]. We refer interested readers to [22] for further details on prior choices and our MCMC approach. Simulation of (1) is readily available using the software BEAST [36].
With the tools to infer the random genealogy G and model parameters given sequence data in hand, we now consider summary statistics to assess the neutral model fit. Consider a vector of test statistics T(G) = [T_{1}(G), ..., T_{K}(G)] that summarize the shape of the genealogy G. Each element T_{k}(G) for k = 1, ..., K serves as a unique mapping between G and the real numbers and generally returns a small value if G were generated by a neutral process and a large value otherwise. One such example for T_{k}(G) is D_{F}. Different T_{k}(G) serve to detect different types of departures from the neutral tree form.
It is important to note that T(G) depends on an unknown model parameter in contrast to a classical test statistic that depends only on fixed quantities, such as the observed data Y or a fixed estimate of the genealogy . In the Bayesian literature, test statistics that depend on unknown model parameters (and also sometimes the data directly) are generally referred to as "discrepancy values" [48] to help differentiate them from classical measures. To simplify notation, we continue to refer to T(G) as a summary statistic with the implicit understanding that it is random and not directly observable. The advantage afforded by leaving T(G) a random variable is that we are now able to compare the discrepancy between the observed data Y and the posited neutral model as a whole, instead of between the data and the best fit of the model. To use T(G) to assess the model fit of neutrality, we consider the following thought experiment. Suppose we randomly simulate under a neutral model a genealogy G^{rep }from a replicated population almost identical to the population yielding the sequence data Y, where both populations share the same unknown demographic parameters θ, number of tips and tipdates. Then, we compare quantities T(G^{rep}) and T(G) given Y. Disparate values signify model misspecification caused by nonneutral evolutionary forces.
We recall that T(G^{rep}) and T(G) given Y are not fixed values, but are random variables represented by probability distributions. As a consequence, we must integrate over all possible realizations weighed by their posterior probabilities to generate a test based on T(·). This process is called posterior predictive simulation [4648,56]. Model selection and critique using posterior predictive simulation has had a successful history in phylogenetics [33,49,5759].
The central distribution that we require is the posterior predictive distribution of the test statistic
In practice, one approximates the predictive distribution in (2) by first generating a posterior sample {G^{(p)}, Ω^{(p)}, θ^{(p)}} for p = 1, ..., P from P(G, Ω, θY). Then, for each p, one draws
where P(G^{rep}θ) describes a selectively neutral coalescent process. Finally, one tabulates T(G^{rep,(p)}). We interpret this predictive distribution as a description of the values that T(·) generates when applied to genealogies from selectively neutral populations. To assess neutrality in the observed data, we compare the predictive distribution to the posterior distribution of the test statistic
approximated by tabulating T(G^{(p)}) for p = 1, ..., P.
When the test statistic T(·) is univariate [33], assessing differences between predictive and posterior distributions can be done in two ways [47]. The first method is graphical, generating a scatterplot of {[T(G^{rep,(p)}, T(G^{(p)})], p = 1, ..., P}. The second method is more formal, employing tailarea probabilities.
Let the posterior predictive pvalue [48]
then p_{B }remains welldefined even though T(G^{rep}) and T(G) given Y are not directly observable [47]. Probability p_{B }shares many characteristics with a classical pvalue; for example, p_{B }can be viewed as its posterior mean and, under the null hypothesis of neutrality, p_{B }is approximately distributed as a Uniform [0, 1) random variable [48]. Given these properties, we reject the selectively neutral model for extreme values of p_{B}, say p_{B }<α = 0.05 for strictly nonnegative T(·) or p_{B }<α/2 or p_{B }> 1  α/2 otherwise.
To calculate p_{B}, a consistent estimator is
where 1{·} is the indicator function, returning 1 if its argument is true and 0 otherwise.
When the test statistic T(·) is multivariate, we are able to detect a greater variety of departures from selective neutrality simultaneously, but a single tailarea probability becomes more troublesome to calculate. In this situation, we first standardize individual elements T_{k}(·) such that var [T_{k}(G)Y] = 1 for all k. This places all measures on a common scale. We then generate scatterplots of the multivariate distributions. We agree with [47] in that comparing the posterior and predictive distributions graphically can provide more information than reporting a single pvalue. For example, we can identify which components T_{k}(·) in T(·) contribute greatest to the discrepancy between the data and a selectively neutral model.
To calculate a tailarea probability in the multivariate setting, we turn to the (squared) Mahalanobis distance in constructing a posterior predictive test [60]. Let be an estimate of the predictive mean of T(G^{rep}) and be an estimate of its variancecovariance matrix, such that
for p = 1, ..., P. Then, we define the (squared) Mahalanobis distance
where we substitute T(G) for x when considering the distance's posterior distribution and T(G^{rep}) for x when considering its predictive distribution. Mahalanobis distances are commonly used in discrimination analysis and classification. The metric of the Mahalanobis distance M(·) is the inverse of the variancecovariance matrix of the predictive distribution and, as such, returns distances normalized relative to the multidimensional spread of the data under selective neutrality. Following in the light of Equation (5), we define the multivariate posterior predictive pvalue
A consistent estimator of the multivariate p_{B }is readily available in the vain of Equation (6).
When it is unclear a priori which elements T_{k}(·) provide the most power to reject selective neutrality, the multivariate approach sidesteps the multiple testing problem inherent in examining each element independently. In these situations, we consider first using (9) as a global test with a fixed Type I Error rate α and then subselecting a small number of individual T_{k}(·) for further univariate analysis. For researchers who begin by examining the K univariate analyses separately, we recommend applying a Bonferroni correction by decreasing the critical value cutoff from α to α/K per test. For large K, a Bonferrioni correction is overly conservative, especially when considering the potentially high correlation between T_{k}(·). At this point, monitoring the false discovery rate [61] becomes more practical.
Authors' contributions
Acknowledgements
The DIMACS Working Group on Phylogenetic Trees and Rapidly Evolving Diseases fostered the initial collaboration between A.J.D. and M.A.S. We thank Andrew Rambaut, Eddie Holmes, Oliver G. Pybus and Allen G. Rodrigo for helpful discussions. We thank Charles Edwards and Daniel Wilson for assistance in producing the simulation results. This research was funded in part by Wellcome Trust Grant 017979 (to A.J.D.) and NIH grant GM086887 and the John Simon Guggenheim Memorial Foundation (to M.A.S.).
References

Tajima F: Statistical method for testing the neutral mutation hypothesis by DNA polymorphism.
Genetics 1989, 123:585595. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Fu Y, Li W: Statistical tests of neutrality of mutations.
Genetics 1993, 133:693709. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Fu Y: Statistical tests of neutrality of mutations against population growth, hitchhiking and background selection.
Genetics 1997, 147:915925. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Fu YX: New statistical tests of neutrality for DNA samples from a population.
Genetics 1996, 143:557570. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Fay J, Wu C: Hitchhiking under positive Darwinian selection.
Genetics 2000, 155:14051413. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Strobeck C: Average number of nucleotide differences in a sample from a single subpopulation: a test for population subdivision.
Genetics 1987, 117:149153. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Hudson R, Bailey K, Skarecky D, Kwiatowski J, Ayala FJ: Evidence for positive selection in the superoxide dismutase (Sod) region of Drosophila melanogaster.
Genetics 1994, 136:13291340. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Slatkin M, Hudson R: Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations.
Genetics 1991, 129:555562. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Mousset S, Derome N, Veuille M: A test of neutrality and constant population size based on the mismatch distribution.
Molecular Biology and Evolution 2004, 21:724731. Publisher Full Text

RamosOnsins S, Rozas J: Statistical properties of new neutrality tests against population growth.

Przeworski M: The Signature of Positive Selection at Randomly Chosen Loci.
Genetics 2002, 160:11791189. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Haddrill P, Thornton K, Charlesworth B, Andolfatto P: Multilocus patterns of nucleotide variability and the demographic and selection history of Drosophila melanogaster populations.
Genome Research 2005, 15:790799. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Innan H, Zhang K, Marjoram P, Tavare S, Rosenberg N: Statistical tests of the coalescent model based on the haplotype frequency distribution and the number of segregating sites.
Genetics 2005, 169:17631777. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Innan H: Modified HudsonKreitmanAguade test and twodimensional evaluation of neutrality tests.
Genetics 2006, 173:17251733. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Li H, Stephan W: Inferring the Demographic History and Rate of Adaptive Substitution in Drosophila.

Grenfell B, Pybus O, Gog J, Wood J, Daly J, Mumford J, Holmes E: Unifying the epidemiological and evolutionary dynamics of pathogens.
Science 2004, 303:327332. PubMed Abstract  Publisher Full Text

Kelly JK: A test of neutrality based on interlocus associations.
Genetics 1997, 146(3):11971206. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Colless D: Review of "Phylogenetics: The Theory and Practice of Phylogenetic Systematics".
Systematic Zoology 1982, 31:100104. Publisher Full Text

Hasegawa M, Cao Y, Yang Z: Preponderance of slightly deleterious polymorphism in mitochondrial DNA: nonsynonymous/synonymous rate ratio is much higher within species than between species.

McKenzie A, Steel M: Distributions of cherries for two models of trees.
Mathematical Biosciences 2000, 164:8192. PubMed Abstract  Publisher Full Text

Aldous D: Stochastic models and descriptive statistics for phylogenetic trees, from Yule to today.
Statistical Science 2001, 16:2334. Publisher Full Text

Drummond A, Nicholls G, Rodrigo A, Solomon W: Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data.
Genetics 2002, 161:13071320. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Drummond A, Pybus O, Rambaut A, Forsberg R, Rodrigo A: Measurably evolving populations.
Trends in Ecology & Evolution 2003, 18:481488. Publisher Full Text

McDonald J, Kreitman M: Adaptive protein evolution at the Adh locus in Drosophila.
Nature 1991, 351:652654. PubMed Abstract  Publisher Full Text

Nielsen R, Weinreich DM: The age of nonsynonymous and synonymous mutations in animal mtDNA and implications for the mildly deleterious theory.
Genetics 1999, 153:497506. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

EyreWalker A, Keightley P, Smith N, Gaffney D: Quantifying the slightly deleterious mutation model of molecular evolution.

Begun DJ, Holloway AK, Stevens K, Hillier LW, Poh YP, Hahn MW, Nista PM, Jones CD, Kern AD, Dewey CN, Pachter L, Myers E, Langley CH: Population genomics: wholegenome analysis of polymorphism and divergence in Drosophila simulans.
PLoS Biol 2007, 5(11):e310. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Hahn MW: Toward a selection theory of molecular evolution.
Evolution 2008, 62(2):255265. PubMed Abstract  Publisher Full Text

EyreWalker A: Changing effective population size and the McDonaldKreitman test.
Genetics 2002, 162:20172024. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Williamson S, Orive M: The genealogy of a sequence subject to purifying selection at multiple sites.

Avise J: Phylogeography: The History and Formation of Species. Cambridge, MA: Harvard University Press; 2000.

McVean G, Charlesworth B: The effects of HillRobertson interference between weakly selected mutations on patterns of molecular evolution and variation.
Genetics 2000, 155:929944. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Edwards C, Holmes E, Pybus O, Wilson D, Viscidi R, Abrams E, Phillips R, Drummond A: Evolution of the HIV1 envelope gene is dominated by purifying selection.
Genetics 2006, 174:14411453. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Kirkpatrick M, Slatkin M: Searching for evolutionary patterns in the shape of a phylogenetic tree.
Evolution 1993, 47:11711181. Publisher Full Text

Barnes I, Matheus P, Shapiro B, Jensen D, Cooper A: Dynamics of Pleistocene population extinctions in Beringian brown bears.
Science 2002, 295:22672270. PubMed Abstract  Publisher Full Text

Drummond AJ, Rambaut A: BEAST: Bayesian evolutionary analysis by sampling trees.
BMC Evol Biol 2007, 7:214. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Hasegawa M, Kishino H, Yano T: Dating the humanape splitting by a molecular clock of mitochondrial DNA.
Journal of Molecular Evolution 1985, 22:160174. PubMed Abstract  Publisher Full Text

Yang Z: Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods.
Journal of Molecular Evolution 1994, 39:306314. PubMed Abstract  Publisher Full Text

Newton M, Raftery A: Approximate Bayesian inference with the weighted likelihood bootstrap.
Journal of the Royal Statistical Society, Series B 1994, 56:348.

Suchard M, Kitchen C, Sinsheimer J, Weiss R: Hierarchical phylogeneic models for analyzing multipartite sequence data.
Systematic Biology 2003, 52:649664. PubMed Abstract  Publisher Full Text

Zlateva K, Lemey P, Vandamme A, van Ranst M: Molecular evolution and circulation patterns of human respiratory syncytial virus subgroup A: positively selected sites in the attachment g glycoprotein.
Journal of Virology 2004, 78:46754683. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Bennett S, Holmes E, Chirivella M, Rodriguez D, Beltran M, Vorndam V, Gubler D, McMillan W: Selectiondriven evolution of emergent dengue virus.
Molecular Biology and Evolution 2003, 20:16501658. Publisher Full Text

Fitch W, Bush RM, Bender C, Cox N: Long term trends in the evolution of H(3) HA1 human influenza type A.
Proceedings of the National Academy of Sciences, USA 1997, 94:77127718. Publisher Full Text

Ferguson N, Galvani A, Bush R: Ecological and immunological determinants of in uenza evolution.
Nature 2003, 422:428433. PubMed Abstract  Publisher Full Text

Minin V, Suchard M: Fast, accuract and simulationfree stochastic mapping.

Box G: Sampling and Bayes inference in scientific modeling and robustness.
Journal of the Royal Statistical Society, Series A 1980, 143:383430. Publisher Full Text

Gelman A, Meng X, Stern H: Posterior predictive assessment of model fitness via realized discrepancies (with discussion).

Meng XL: Posterior predictive pvalues.
Annals of Statistics 1994, 22:11421160. Publisher Full Text

Suchard M, Weiss R, Sinsheimer J, Dorman K, Patel M, McCabe E: Evolutionary similarity among genes.
Journal of the American Statistical Assocation 2003, 98:653662. Publisher Full Text

Kass R, Raftery A: Bayes factors.
Journal of the American Statistical Association 1995, 90:773795. Publisher Full Text

Suchard M, Weiss R, Sinsheimer J: Bayesian selection of continuoustime Markov chain evolutionary models.

Lange K: Mathematical and Statistical Methods for Genetic Analysis. New York, NY: Springer; 1997.

Stochastic Processes and their Applications 1982, 13:235248. Publisher Full Text

Metropolis N, Rosenbluth A, Rosenbluth M, Teller A, Teller E: Equation of state calculations by fast computing machines.
Journal of Chemical Physics 1953, 21:10871092. Publisher Full Text

Hastings W: Monte Carlo sampling methods using Markov chains and their applications.
Biometrika 1970, 57:97109. Publisher Full Text

Rubin D: Bayesianly justifiable and relevant frequency calculations for the applied statisticians.
Annals of Statistics 1984, 12:11511172. Publisher Full Text

Suchard M, Weiss R, Dorman K, Patel M, McCabe E, Sinsheimer J: Evolutionary similarity among genes when data are sparse. In Proceedings of the Section on Bayesian Statistical Science. Alexandria, VA: American Statistical Assocation; 2000:9297.

Bollback J: Bayesian model adequacy and choice in phylogenetics.

Nielsen R, Huelsenbeck J: Detecting positively selected amino acid sites using posterior predictive Pvalues.

O'Hagan A: HSSS model criticism (with discussion). In Highly Structured Stochastic Systems. Edited by Green P, Hjort N, Richardson S. Oxford, UK: Oxford University Press; 2003:423453.

Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing.
Journal of the Royal Statisical Society, Series B 1995, 57:289300.