Ranking insertion, deletion and nonsense mutations based on their effect on genetic information
Department of Cell & Systems Biology, University of Toronto, 25 Willcocks Street, Toronto, Ontario, M5S 3B2, Canada
BMC Bioinformatics 2011, 12:299 doi:10.1186/1471-2105-12-299Published: 22 July 2011
Genetic variations contribute to normal phenotypic differences as well as diseases, and new sequencing technologies are greatly increasing the capacity to identify these variations. Given the large number of variations now being discovered, computational methods to prioritize the functional importance of genetic variations are of growing interest. Thus far, the focus of computational tools has been mainly on the prediction of the effects of amino acid changing single nucleotide polymorphisms (SNPs) and little attention has been paid to indels or nonsense SNPs that result in premature stop codons.
We propose computational methods to rank insertion-deletion mutations in the coding as well as non-coding regions and nonsense mutations. We rank these variations by measuring the extent of their effect on biological function, based on the assumption that evolutionary conservation reflects function. Using sequence data from budding yeast and human, we show that variations which that we predict to have larger effects segregate at significantly lower allele frequencies, and occur less frequently than expected by chance, indicating stronger purifying selection. Furthermore, we find that insertions, deletions and premature stop codons associated with disease in the human have significantly larger predicted effects than those not associated with disease. Interestingly, the large-effect mutations associated with disease show a similar distribution of predicted effects to that expected for completely random mutations.
This demonstrates that the evolutionary conservation context of the sequences that harbour insertions, deletions and nonsense mutations can be used to predict and rank the effects of the mutations.