Log on / register
Feedback | Support | My details
Open AccessHighly AccessMethodology article

MegaSNPHunter: a learning approach to detect disease predisposition SNPs and high level interactions in genome wide association study

Xiang Wan1 email, Can Yang1 email, Qiang Yang2 email, Hong Xue3 email, Nelson LS Tang4 email and Weichuan Yu1 email

Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Hong Kong, PR China

Department of Computer Science, Hong Kong University of Science and Technology, Hong Kong, PR China

Department of Biochemistry, Hong Kong University of Science and Technology, Hong Kong, PR China

Laboratory for Genetics of Disease Susceptibility, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong, PR China

author email corresponding author email

BMC Bioinformatics 2009, 10:13doi:10.1186/1471-2105-10-13

Published: 9 January 2009

Abstract

Background

The interactions of multiple single nucleotide polymorphisms (SNPs) are highly hypothesized to affect an individual's susceptibility to complex diseases. Although many works have been done to identify and quantify the importance of multi-SNP interactions, few of them could handle the genome wide data due to the combinatorial explosive search space and the difficulty to statistically evaluate the high-order interactions given limited samples.

Results

Three comparative experiments are designed to evaluate the performance of MegaSNPHunter. The first experiment uses synthetic data generated on the basis of epistasis models. The second one uses a genome wide study on Parkinson disease (data acquired by using Illumina HumanHap300 SNP chips). The third one chooses the rheumatoid arthritis study from Wellcome Trust Case Control Consortium (WTCCC) using Affymetrix GeneChip 500K Mapping Array Set. MegaSNPHunter outperforms the best solution in this area and reports many potential interactions for the two real studies.

Conclusion

The experimental results on both synthetic data and two real data sets demonstrate that our proposed approach outperforms the best solution that is currently available in handling large-scale SNP data both in terms of speed and in terms of detection of potential interactions that were not identified before. To our knowledge, MegaSNPHunter is the first approach that is capable of identifying the disease-associated SNP interactions from WTCCC studies and is promising for practical disease prognosis.


© 1999-2009 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.