Large-scale multiple testing in genome-wide association studies via region-specific hidden Markov models
Key Laboratory for Applied Statistics of MOE, School of Mathematics and Statistics, Northeast Normal University, Changchun 130024, China
BMC Bioinformatics 2013, 14:282 doi:10.1186/1471-2105-14-282Published: 25 September 2013
Identifying genetic variants associated with complex human diseases is a great challenge in genome-wide association studies (GWAS). Single nucleotide polymorphisms (SNPs) arising from genetic background are often dependent. The existing methods, i.e., local index of significance (LIS) and pooled local index of significance (PLIS), were both proposed for modeling SNP dependence and assumed that the whole chromosome follows a hidden Markov model (HMM). However, the fact that SNP data are often collected from separate heterogeneous regions of a single chromosome encourages different chromosomal regions to follow different HMMs. In this research, we developed a data-driven penalized criterion combined with a dynamic programming algorithm to find change points that divide the whole chromosome into more homogeneous regions. Furthermore, we extended PLIS to analyze the dependent tests obtained from multiple chromosomes with different regions for GWAS.
The simulation results show that our new criterion can improve the performance of the model selection procedure and that our region-specific PLIS (RSPLIS) method is better than PLIS at detecting disease-associated SNPs when there are multiple change points along a chromosome. Our method has been used to analyze the Daly study, and compared with PLIS, RSPLIS yielded results that more accurately detected disease-associated SNPs.
The genomic rankings based on our method differ from the rankings based on PLIS. Specifically, for the detection of genetic variants with weak effect sizes, the RSPLIS method was able to rank them more efficiently and with greater power.