Multilocus association mapping using generalized ridge logistic regression
1 Department of Statistics, University of Chicago, 5734 S. University Avenue, Chicago, IL 60637, USA
2 Department of Biostatistics, Harvard School of Public Health, 655 Huntington Avenue, Boston, MA 02115, USA
3 Key Laboratory of Mental Health, Institute of Psychology, Chinese Academy of Sciences, 4A Datun Road, Beijing 100101, China
BMC Bioinformatics 2011, 12:384 doi:10.1186/1471-2105-12-384Published: 29 September 2011
In genome-wide association studies, it is widely accepted that multilocus methods are more powerful than testing single-nucleotide polymorphisms (SNPs) one at a time. Among statistical approaches considering many predictors simultaneously, scan statistics are an effective tool for detecting susceptibility genomic regions and mapping disease genes. In this study, inspired by the idea of scan statistics, we propose a novel sliding window-based method for identifying a parsimonious subset of contiguous SNPs that best predict disease status.
Within each sliding window, we apply a forward model selection procedure using generalized ridge logistic regression for model fitness in each step. In power simulations, we compare the performance of our method with that of five other methods in current use. Averaging power over all the conditions considered, our method dominates the others. We also present two published datasets where our method is useful in causal SNP identification.
Our method can automatically combine genetic information in local genomic regions and allow for linkage disequilibrium between SNPs. It can overcome some defects of the scan statistics approach and will be very promising in genome-wide case-control association studies.