This article is part of the supplement: Genetic Analysis Workshop 16
Analysis of genome-wide association data by large-scale Bayesian logistic regression
1 Department of Biostatistics, School of Public Health, Columbia University, 722 West 168th Street, New York, NY 10032, USA
2 Department of Mathematics and Statistics, Georgia State University, 750 COE, 7th Floor, 30 Pryor Street, Atlanta, GA 30303, USA
BMC Proceedings 2009, 3(Suppl 7):S16 doi:Published: 15 December 2009
Single-locus analysis is often used to analyze genome-wide association (GWA) data, but such analysis is subject to severe multiple comparisons adjustment. Multivariate logistic regression is proposed to fit a multi-locus model for case-control data. However, when the sample size is much smaller than the number of single-nucleotide polymorphisms (SNPs) or when correlation among SNPs is high, traditional multivariate logistic regression breaks down. To accommodate the scale of data from a GWA while controlling for collinearity and overfitting in a high dimensional predictor space, we propose a variable selection procedure using Bayesian logistic regression. We explored a connection between Bayesian regression with certain priors and L1 and L2 penalized logistic regression. After analyzing large number of SNPs simultaneously in a Bayesian regression, we selected important SNPs for further consideration. With much fewer SNPs of interest, problems of multiple comparisons and collinearity are less severe. We conducted simulation studies to examine probability of correctly selecting disease contributing SNPs and applied developed methods to analyze Genetic Analysis Workshop 16 North American Rheumatoid Arthritis Consortium data.