Mutagenesis Objective Search and Selection Tool (MOSST): an algorithm to predict structure-function related mutations in proteins
Centre for Biochemical Engineering and Biotechnology, Institute for Cell Dynamics and Biotechnology: a Centre for Systems Biology, University of Chile, Beauchef 850, Santiago, Chile
BMC Bioinformatics 2011, 12:122 doi:10.1186/1471-2105-12-122Published: 27 April 2011
Functionally relevant artificial or natural mutations are difficult to assess or predict if no structure-function information is available for a protein. This is especially important to correctly identify functionally significant non-synonymous single nucleotide polymorphisms (nsSNPs) or to design a site-directed mutagenesis strategy for a target protein. A new and powerful methodology is proposed to guide these two decision strategies, based only on conservation rules of physicochemical properties of amino acids extracted from a multiple alignment of a protein family where the target protein belongs, with no need of explicit structure-function relationships.
A statistical analysis is performed over each amino acid position in the multiple protein alignment, based on different amino acid physical or chemical characteristics, including hydrophobicity, side-chain volume, charge and protein conformational parameters. The variances of each of these properties at each position are combined to obtain a global statistical indicator of the conservation degree of each property. Different types of physicochemical conservation are defined to characterize relevant and irrelevant positions. The differences between statistical variances are taken together as the basis of hypothesis tests at each position to search for functionally significant mutable sites and to identify specific mutagenesis targets. The outcome is used to statistically predict physicochemical consensus sequences based on different properties and to calculate the amino acid propensities at each position in a given protein. Hence, amino acid positions are identified that are putatively responsible for function, specificity, stability or binding interactions in a family of proteins. Once these key functional positions are identified, position-specific statistical distributions are applied to divide the 20 common protein amino acids in each position of the protein's primary sequence into a group of functionally non-disruptive amino acids and a second group of functionally deleterious amino acids.
With this approach, not only conserved amino acid positions in a protein family can be labeled as functionally relevant, but also non-conserved amino acid positions can be identified to have a physicochemically meaningful functional effect. These results become a discriminative tool in the selection and elaboration of rational mutagenesis strategies for the protein. They can also be used to predict if a given nsSNP, identified, for instance, in a genomic-scale analysis, can have a functional implication for a particular protein and which nsSNPs are most likely to be functionally silent for a protein. This analytical tool could be used to rapidly and automatically discard any irrelevant nsSNP and guide the research focus toward functionally significant mutations. Based on preliminary results and applications, this technique shows promising performance as a valuable bioinformatics tool to aid in the development of new protein variants and in the understanding of function-structure relationships in proteins.