Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Open Badges Research article

Bayesian semiparametric regression models to characterize molecular evolution

Saheli Datta1*, Abel Rodriguez2 and Raquel Prado2

Author Affiliations

1 , Fred Hutchinson Cancer Research Center, Seattle, WA, USA

2 Department of Applied Mathematics and Statistics, University of California Santa Cruz, Santa Cruz, CA, USA

For all author emails, please log on.

BMC Bioinformatics 2012, 13:278  doi:10.1186/1471-2105-13-278

Published: 30 October 2012



Statistical models and methods that associate changes in the physicochemical properties of amino acids with natural selection at the molecular level typically do not take into account the correlations between such properties. We propose a Bayesian hierarchical regression model with a generalization of the Dirichlet process prior on the distribution of the regression coefficients that describes the relationship between the changes in amino acid distances and natural selection in protein-coding DNA sequence alignments.


The Bayesian semiparametric approach is illustrated with simulated data and the abalone lysin sperm data. Our method identifies groups of properties which, for this particular dataset, have a similar effect on evolution. The model also provides nonparametric site-specific estimates for the strength of conservation of these properties.


The model described here is distinguished by its ability to handle a large number of amino acid properties simultaneously, while taking into account that such data can be correlated. The multi-level clustering ability of the model allows for appealing interpretations of the results in terms of properties that are roughly equivalent from the standpoint of molecular evolution.