Bayesian semiparametric regression models to characterize molecular evolution
1 , Fred Hutchinson Cancer Research Center, Seattle, WA, USA
2 Department of Applied Mathematics and Statistics, University of California Santa Cruz, Santa Cruz, CA, USA
BMC Bioinformatics 2012, 13:278 doi:10.1186/1471-2105-13-278Published: 30 October 2012
Statistical models and methods that associate changes in the physicochemical properties of amino acids with natural selection at the molecular level typically do not take into account the correlations between such properties. We propose a Bayesian hierarchical regression model with a generalization of the Dirichlet process prior on the distribution of the regression coefficients that describes the relationship between the changes in amino acid distances and natural selection in protein-coding DNA sequence alignments.
The Bayesian semiparametric approach is illustrated with simulated data and the abalone lysin sperm data. Our method identifies groups of properties which, for this particular dataset, have a similar effect on evolution. The model also provides nonparametric site-specific estimates for the strength of conservation of these properties.
The model described here is distinguished by its ability to handle a large number of amino acid properties simultaneously, while taking into account that such data can be correlated. The multi-level clustering ability of the model allows for appealing interpretations of the results in terms of properties that are roughly equivalent from the standpoint of molecular evolution.