Fully Bayesian tests of neutrality using genealogical summary statistics
1 Bioinformatics Institute, University of Auckland, Private Bag 92019, Auckland, New Zealand
2 Department of Computer Science, University of Auckland, Private Bag 92019, Auckland, New Zealand
3 Departments of Biomathematics and Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, California, USA
4 Department of Biostatistics, UCLA School of Public Health, Los Angeles, California, USA
BMC Genetics 2008, 9:68 doi:10.1186/1471-2156-9-68Published: 31 October 2008
Many data summary statistics have been developed to detect departures from neutral expectations of evolutionary models. However questions about the neutrality of the evolution of genetic loci within natural populations remain difficult to assess. One critical cause of this difficulty is that most methods for testing neutrality make simplifying assumptions simultaneously about the mutational model and the population size model. Consequentially, rejecting the null hypothesis of neutrality under these methods could result from violations of either or both assumptions, making interpretation troublesome.
Here we harness posterior predictive simulation to exploit summary statistics of both the data and model parameters to test the goodness-of-fit of standard models of evolution. We apply the method to test the selective neutrality of molecular evolution in non-recombining gene genealogies and we demonstrate the utility of our method on four real data sets, identifying significant departures of neutrality in human influenza A virus, even after controlling for variation in population size.
Importantly, by employing a full model-based Bayesian analysis, our method separates the effects of demography from the effects of selection. The method also allows multiple summary statistics to be used in concert, thus potentially increasing sensitivity. Furthermore, our method remains useful in situations where analytical expectations and variances of summary statistics are not available. This aspect has great potential for the analysis of temporally spaced data, an expanding area previously ignored for limited availability of theory and methods.