Email updates

Keep up to date with the latest news and content from BMC Proceedings and BioMed Central.

This article is part of the supplement: Genetic Analysis Workshop 17: Unraveling Human Exome Data

Open Access Proceedings

Identification of functional genetic variation in exome sequence analysis

Andrew Jaffe1, Genevieve Wojcik1, Audrey Chu1, Asieh Golozar12, Ankit Maroo1, Priya Duggal1 and Alison P Klein134*

  • * Corresponding author: Alison P Klein

  • † Equal contributors

Author Affiliations

1 Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, 615 N Wolfe Street, Baltimore, MD 21205, USA

2 Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, 6120 Executive Boulevard, Bethesda, MD 20892, USA

3 Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Harry and Jeanette Weinberg Building, 401 North Broadway, Baltimore, MD 21231, USA

4 Department of Pathology, Johns Hopkins School of Medicine, 600 North Wolfe Street, Baltimore, MD 21287, USA

For all author emails, please log on.

BMC Proceedings 2011, 5(Suppl 9):S13  doi:10.1186/1753-6561-5-S9-S13

Published: 29 November 2011


Recent technological advances have allowed us to study individual genomes at a base-pair resolution and have demonstrated that the average exome harbors more than 15,000 genetic variants. However, our ability to understand the biological significance of the identified variants and to connect these observed variants with phenotypes is limited. The first step in this process is to identify genetic variation that is likely to result in changes to protein structure and function, because detailed studies, either population based or functional, for each of the identified variants are not practicable. Therefore algorithms that yield valid predictions of a variant’s functional significance are needed. Over the past decade, several programs have been developed to predict the probability that an observed sequence variant will have a deleterious effect on protein function. These algorithms range from empirical programs that classify using known biochemical properties to statistical algorithms trained using a variety of data sources, including sequence conservation data, biochemical properties, and functional data. Using data from the pilot3 study of the 1000 Genomes Project available through Genetic Analysis Workshop 17, we compared the results of four programs (SIFT, PolyPhen, MAPP, and VarioWatch) used to predict the functional relevance of variants in 101 genes. Analysis was conducted without knowledge of the simulation model. Agreement between programs was modest ranging from 59.4% to 71.4% and only 3.5% of variants were classified as deleterious and 10.9% as tolerated across all four programs.