Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Research article

Bayesian semiparametric regression models to characterize molecular evolution

Saheli Datta1*, Abel Rodriguez2 and Raquel Prado2

Author affiliations

1 , Fred Hutchinson Cancer Research Center, Seattle, WA, USA

2 Department of Applied Mathematics and Statistics, University of California Santa Cruz, Santa Cruz, CA, USA

For all author emails, please log on.

Citation and License

BMC Bioinformatics 2012, 13:278  doi:10.1186/1471-2105-13-278

Published: 30 October 2012

Abstract

Background

Statistical models and methods that associate changes in the physicochemical properties of amino acids with natural selection at the molecular level typically do not take into account the correlations between such properties. We propose a Bayesian hierarchical regression model with a generalization of the Dirichlet process prior on the distribution of the regression coefficients that describes the relationship between the changes in amino acid distances and natural selection in protein-coding DNA sequence alignments.

Results

The Bayesian semiparametric approach is illustrated with simulated data and the abalone lysin sperm data. Our method identifies groups of properties which, for this particular dataset, have a similar effect on evolution. The model also provides nonparametric site-specific estimates for the strength of conservation of these properties.

Conclusions

The model described here is distinguished by its ability to handle a large number of amino acid properties simultaneously, while taking into account that such data can be correlated. The multi-level clustering ability of the model allows for appealing interpretations of the results in terms of properties that are roughly equivalent from the standpoint of molecular evolution.