Open Access Highly Accessed Open Badges Research article

Biophysical and structural considerations for protein sequence evolution

Johan A Grahnen1, Priyanka Nandakumar12, Jan Kubelka3 and David A Liberles1*

Author affiliations

1 Department of Molecular Biology, University of Wyoming, Laramie, WY 82071, USA

2 Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA

3 Department of Chemistry, University of Wyoming, Laramie, WY 82071, USA

For all author emails, please log on.

Citation and License

BMC Evolutionary Biology 2011, 11:361  doi:10.1186/1471-2148-11-361

Published: 16 December 2011



Protein sequence evolution is constrained by the biophysics of folding and function, causing interdependence between interacting sites in the sequence. However, current site-independent models of sequence evolutions do not take this into account. Recent attempts to integrate the influence of structure and biophysics into phylogenetic models via statistical/informational approaches have not resulted in expected improvements in model performance. This suggests that further innovations are needed for progress in this field.


Here we develop a coarse-grained physics-based model of protein folding and binding function, and compare it to a popular informational model. We find that both models violate the assumption of the native sequence being close to a thermodynamic optimum, causing directional selection away from the native state. Sampling and simulation show that the physics-based model is more specific for fold-defining interactions that vary less among residue type. The informational model diffuses further in sequence space with fewer barriers and tends to provide less support for an invariant sites model, although amino acid substitutions are generally conservative. Both approaches produce sequences with natural features like dN/dS < 1 and gamma-distributed rates across sites.


Simple coarse-grained models of protein folding can describe some natural features of evolving proteins but are currently not accurate enough to use in evolutionary inference. This is partly due to improper packing of the hydrophobic core. We suggest possible improvements on the representation of structure, folding energy, and binding function, as regards both native and non-native conformations, and describe a large number of possible applications for such a model.