Fast optimization of statistical potentials for structurally constrained phylogenetic models
1 Département d'Informatique, LIRMM, 161 rue Ada, 34392 Montpellier Cedex 5, France
2 Département de Biochimie, Université de Montréal, Montréal, Québec, Canada
3 Department of Biology, University of Ottawa, Ottawa, Ontario, Canada
BMC Evolutionary Biology 2009, 9:227 doi:10.1186/1471-2148-9-227Published: 9 September 2009
Statistical approaches for protein design are relevant in the field of molecular evolutionary studies. In recent years, new, so-called structurally constrained (SC) models of protein-coding sequence evolution have been proposed, which use statistical potentials to assess sequence-structure compatibility. In a previous work, we defined a statistical framework for optimizing knowledge-based potentials especially suited to SC models. Our method used the maximum likelihood principle and provided what we call the joint potentials. However, the method required numerical estimations by the use of computationally heavy Markov Chain Monte Carlo sampling algorithms.
Here, we develop an alternative optimization procedure, based on a leave-one-out argument coupled to fast gradient descent algorithms. We assess that the leave-one-out potential yields very similar results to the joint approach developed previously, both in terms of the resulting potential parameters, and by Bayes factor evaluation in a phylogenetic context. On the other hand, the leave-one-out approach results in a considerable computational benefit (up to a 1,000 fold decrease in computational time for the optimization procedure).
Due to its computational speed, the optimization method we propose offers an attractive alternative for the design and empirical evaluation of alternative forms of potentials, using large data sets and high-dimensional parameterizations.