Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Methodology article

Coupled mutation finder: A new entropy-based method quantifying phylogenetic noise for the detection of compensatory mutations

Mehmet Gültas1*, Martin Haubrock2, Nesrin Tüysüz3 and Stephan Waack1*

Author Affiliations

1 Institute of Computer Science, University of Göttingen, Goldschmidtstr. 7, Göttingen, 37077, Germany

2 Department of Bioinformatics, University of Göttingen, Goldschmidtstr. 1, 37077 Göttingen, Germany

3 Erasmus MC Stem Cell Institute, Department of Cell Biology, Erasmus Medical Center, Rotterdam, The Netherlands

For all author emails, please log on.

BMC Bioinformatics 2012, 13:225  doi:10.1186/1471-2105-13-225

Published: 11 September 2012

Abstract

Background

The detection of significant compensatory mutation signals in multiple sequence alignments (MSAs) is often complicated by noise. A challenging problem in bioinformatics is remains the separation of significant signals between two or more non-conserved residue sites from the phylogenetic noise and unrelated pair signals. Determination of these non-conserved residue sites is as important as the recognition of strictly conserved positions for understanding of the structural basis of protein functions and identification of functionally important residue regions. In this study, we developed a new method, the Coupled Mutation Finder (CMF) quantifying the phylogenetic noise for the detection of compensatory mutations.

Results

To demonstrate the effectiveness of this method, we analyzed essential sites of two human proteins: epidermal growth factor receptor (EGFR) and glucokinase (GCK). Our results suggest that the CMF is able to separate significant compensatory mutation signals from the phylogenetic noise and unrelated pair signals. The vast majority of compensatory mutation sites found by the CMF are related to essential sites of both proteins and they are likely to affect protein stability or functionality.

Conclusions

The CMF is a new method, which includes an MSA-specific statistical model based on multiple testing procedures that quantify the error made in terms of the false discovery rate and a novel entropy-based metric to upscale BLOSUM62 dissimilar compensatory mutations. Therefore, it is a helpful tool to predict and investigate compensatory mutation sites of structural or functional importance in proteins. We suggest that the CMF could be used as a novel automated function prediction tool that is required for a better understanding of the structural basis of proteins. The CMF server is freely accessible at http://cmf.bioinf.med.uni-goettingen.de. webcite