Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Open Badges Research article

The non-random clustering of non-synonymous substitutions and its relationship to evolutionary rate

Lisa G McFerrin1 and Eric A Stone12*

Author affiliations

1 Graduate program in Bioinformatics, North Carolina State University, Raleigh, NC, USA 27695-7566

2 Department of Genetics, North Carolina State University, Raleigh, NC, USA 27695-7614

For all author emails, please log on.

Citation and License

BMC Genomics 2011, 12:415  doi:10.1186/1471-2164-12-415

Published: 16 August 2011



Protein sequences are subject to a mosaic of constraint. Changes to functional domains and buried residues, for example, are more apt to disrupt protein structure and function than are changes to residues participating in loops or exposed to solvent. Regions of constraint on the tertiary structure of a protein often result in loose segmentation of its primary structure into stretches of slowly- and rapidly-evolving amino acids. This clustering can be exploited, and existing methods have done so by relying on local sequence conservation as a signature of selection to help identify functionally important regions within proteins. We invert this paradigm by leveraging the regional nature of protein structure and function to both illuminate and make use of genome-wide patterns of local sequence conservation.


Our hypothesis is that the regional nature of structural and functional constraints will assert a positive autocorrelation on the evolutionary rates of neighboring sites, which, in a pairwise comparison of orthologous proteins, will manifest itself as the clustering of non-synonymous changes across the amino acid sequence. We introduce a dispersion ratio statistic to test this and related hypotheses. Using genome-wide interspecific comparisons of orthologous protein pairs, we reveal a strong log-linear relationship between the degree of clustering and the intensity of constraint. We further demonstrate how this relationship varies with the evolutionary distance between the species being compared. We provide some evidence that proteins with a history of positive selection deviate from genome-wide trends.


We find a significant association between the evolutionary rate of a protein and the degree to which non-synonymous changes cluster along its primary sequence. We show that clustering is a non-redundant predictor of evolutionary rate, and we speculate that conflicting signals of clustering and constraint may be indicative of a historical period of relaxed selection.