Email updates

Keep up to date with the latest news and content from BMC Proceedings and BioMed Central.

This article is part of the supplement: Genetic Analysis Workshop 16

Open Access Proceedings

Analysis of genome-wide association data by large-scale Bayesian logistic regression

Yuanjia Wang1*, Nanshi Sha1 and Yixin Fang2

Author Affiliations

1 Department of Biostatistics, School of Public Health, Columbia University, 722 West 168th Street, New York, NY 10032, USA

2 Department of Mathematics and Statistics, Georgia State University, 750 COE, 7th Floor, 30 Pryor Street, Atlanta, GA 30303, USA

For all author emails, please log on.

BMC Proceedings 2009, 3(Suppl 7):S16  doi:

Published: 15 December 2009


Single-locus analysis is often used to analyze genome-wide association (GWA) data, but such analysis is subject to severe multiple comparisons adjustment. Multivariate logistic regression is proposed to fit a multi-locus model for case-control data. However, when the sample size is much smaller than the number of single-nucleotide polymorphisms (SNPs) or when correlation among SNPs is high, traditional multivariate logistic regression breaks down. To accommodate the scale of data from a GWA while controlling for collinearity and overfitting in a high dimensional predictor space, we propose a variable selection procedure using Bayesian logistic regression. We explored a connection between Bayesian regression with certain priors and L1 and L2 penalized logistic regression. After analyzing large number of SNPs simultaneously in a Bayesian regression, we selected important SNPs for further consideration. With much fewer SNPs of interest, problems of multiple comparisons and collinearity are less severe. We conducted simulation studies to examine probability of correctly selecting disease contributing SNPs and applied developed methods to analyze Genetic Analysis Workshop 16 North American Rheumatoid Arthritis Consortium data.