Email updates

Keep up to date with the latest news and content from BMC Proceedings and BioMed Central.

This article is part of the supplement: Beyond the Genome 2012

Open Access Poster presentation

Bernoulli mixture models in application to the evaluation of algorithms estimating functionality of missense mutations

Stephanie Hicks1*, Sharon E Plon2 and Marek Kimmel1

  • * Corresponding author: Stephanie Hicks

Author Affiliations

1 Department of Statistics, Rice University, Houston, TX, USA

2 Departments of Pediatrics and Human and Molecular Genetics, Baylor College of Medicine, Houston, TX, USA

For all author emails, please log on.

BMC Proceedings 2012, 6(Suppl 6):P15  doi:10.1186/1753-6561-6-S6-P15


The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1753-6561/6/S6/P15


Published:1 October 2012

© 2012 Hicks et al; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Background

Whole genome and whole exome sequencing projects yield thousands of missense mutations with unknown functionality. Direct estimation of the sensitivity and specificity of bioinformatic algorithms predicting the impact of missense mutations on protein function requires a 'gold standard' or set of mutations with known functionality. In the absence of a gold standard, additional statistical methods are needed to estimate the accuracy of these algorithms. It has been shown informative predictions depend on the algorithm and sequence alignment employed and often algorithms disagree as to which mutations are predicted deleterious or neutral [1].

Materials and methods

To investigate the level of agreement, disjoint categories of sets of mutations are defined depending on which algorithms predict which mutations to be deleterious or neutral. We have developed two statistical models called Bernoulli mixture (BM) and augmented Bernoulli mixture (ABM) based on the capture-recapture technique that employs these disjoint categories. Application of these models allows us to jointly estimate the sensitivities and specificities of each algorithm considered without the use of a gold standard and to estimate the proportion of deleterious mutations in a given set. These estimates may then be used to calculate the posterior probability of a given variant being deleterious. When considering n algorithms, there are 2" disjoint categories employed by the ABM model, which includes 2n + 3 parameters, and the BM model is a special case of the ABM model that includes 2n + 1 parameters. We use the expectation-maximization algorithm for parameter estimation.

Results

We apply the models to two types of predictions of functionality: simulated and real predictions. Using simulated predictions, we accurately recover the true sensitivity and specificity values and report confidence regions. We show example posterior probabilities of a given variant being deleterious. When a gold standard is available, we show the sensitivity and specificity estimates reported the BM and ABM models closely match the sensitivity and specificity estimated directly using the true functionality status. To test our models on mutations without known functionality, we apply the models to mutations obtained from the exomes of four individuals which were sequenced at the Human Genome Sequencing Center at Baylor College of Medicine to identify cancer susceptibility genes for acute lymphocytic leukemia and lymphoma in children. Within each individual, we estimate posterior probabilities for each variant being deleterious and apply an intersection filter to look for deleterious mutations shared by the three affected individuals, but not in the unaffected individual.

Conclusions

The BM and ABM models may be used to estimate the sensitivity and specificity of algorithms predicting the functionality of mutations without the use of a gold standard and to calculate posterior probabilities of a given variant being deleterious which may be used downstream in application of finding causal variants in next-generation sequencing.

Acknowledgements

Supported by CPRIT grant R83940, NCI grant CA155767 and NCI T32 training grant CA096520.

References

  1. Hicks S, Wheeler DA, Plon SE, Kimmel M: Prediction of missense mutation functionality depends on both the algorithm and sequence alignment employed.

    Hum Mut 2011, 32:661-668. PubMed Abstract | Publisher Full Text OpenURL