This article is part of the supplement: Genetic Analysis Workshop 17: Unraveling Human Exome Data
Large-scale risk prediction applied to Genetic Analysis Workshop 17 mini-exome sequence data
1 Department of Epidemiology and Public Health, Yale University, 60 College Street, New Haven, CT 06520, USA
2 Keck Laboratory, Yale University, 300 George Street, New Haven, CT 06511, USA
3 Department of Electronic Science and Technology, University of Science and Technology of China, Hefei, China
4 Bioinformatics and Molecular Imaging Key Laboratory, Huazhong University of Science and Technology, Wuhan, China
BMC Proceedings 2011, 5(Suppl 9):S46 doi:10.1186/1753-6561-5-S9-S46Published: 29 November 2011
We consider the application of Efron’s empirical Bayes classification method to risk prediction in a genome-wide association study using the Genetic Analysis Workshop 17 (GAW17) data. A major advantage of using this method is that the effect size distribution for the set of possible features is empirically estimated and that all subsequent parameter estimation and risk prediction is guided by this distribution. Here, we generalize Efron’s method to allow for some of the peculiarities of the GAW17 data. In particular, we introduce two ways to extend Efron’s model: a weighted empirical Bayes model and a joint covariance model that allows the model to properly incorporate the annotation information of single-nucleotide polymorphisms (SNPs). In the course of our analysis, we examine several aspects of the possible simulation model, including the identity of the most important genes, the differing effects of synonymous and nonsynonymous SNPs, and the relative roles of covariates and genes in conferring disease risk. Finally, we compare the three methods to each other and to other classifiers (random forest and neural network).