Correlated mutations via regularized multinomial regression
1 Central Tuber Crops Research Institute, Thiruvananthapuram-695017, Kerala, India
2 Biometris, Wageningen University and Research Centre, Box 100, 6700 AC Wageningen, The Netherlands
3 Applied Bioinformatics, Plant Research International, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
4 Keygene N.V., P.O. Box 216, 6700 AE Wageningen, The Netherlands
BMC Bioinformatics 2011, 12:444 doi:10.1186/1471-2105-12-444Published: 14 November 2011
In addition to sequence conservation, protein multiple sequence alignments contain evolutionary signal in the form of correlated variation among amino acid positions. This signal indicates positions in the sequence that influence each other, and can be applied for the prediction of intra- or intermolecular contacts. Although various approaches exist for the detection of such correlated mutations, in general these methods utilize only pairwise correlations. Hence, they tend to conflate direct and indirect dependencies.
We propose RMRCM, a method for Regularized Multinomial Regression in order to obtain Correlated Mutations from protein multiple sequence alignments. Importantly, our method is not restricted to pairwise (column-column) comparisons only, but takes into account the network nature of relationships between protein residues in order to predict residue-residue contacts. The use of regularization ensures that the number of predicted links between columns in the multiple sequence alignment remains limited, preventing overprediction. Using simulated datasets we analyzed the performance of our approach in predicting residue-residue contacts, and studied how it is influenced by various types of noise. For various biological datasets, validation with protein structure data indicates a good performance of the proposed algorithm for the prediction of residue-residue contacts, in comparison to previous results. RMRCM can also be applied to predict interactions (in addition to only predicting interaction sites or contact sites), as demonstrated by predicting PDZ-peptide interactions.
A novel method is presented, which uses regularized multinomial regression in order to obtain correlated mutations from protein multiple sequence alignments.