Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Research article

Cloud computing for detecting high-order genome-wide epistatic interaction via dynamic clustering

Xuan Guo, Yu Meng, Ning Yu and Yi Pan*

  • * Corresponding author: Yi Pan yipan@gsu.edu

  • † Equal contributors

Author Affiliations

Department of Computer Science, Georgia State University, 34 Peachtree Street, Atlanta, USA

For all author emails, please log on.

BMC Bioinformatics 2014, 15:102  doi:10.1186/1471-2105-15-102

Published: 10 April 2014

Abstract

Backgroud

Taking the advan tage of high-throughput single nucleotide polymorphism (SNP) genotyping technology, large genome-wide association studies (GWASs) have been considered to hold promise for unravelling complex relationships between genotype and phenotype. At present, traditional single-locus-based methods are insufficient to detect interactions consisting of multiple-locus, which are broadly existing in complex traits. In addition, statistic tests for high order epistatic interactions with more than 2 SNPs propose computational and analytical challenges because the computation increases exponentially as the cardinality of SNPs combinations gets larger.

Results

In this paper, we provide a simple, fast and powerful method using dynamic clustering and cloud computing to detect genome-wide multi-locus epistatic interactions. We have constructed systematic experiments to compare powers performance against some recently proposed algorithms, including TEAM, SNPRuler, EDCF and BOOST. Furthermore, we have applied our method on two real GWAS datasets, Age-related macular degeneration (AMD) and Rheumatoid arthritis (RA) datasets, where we find some novel potential disease-related genetic factors which are not shown up in detections of 2-loci epistatic interactions.

Conclusions

Experimental results on simulated data demonstrate that our method is more powerful than some recently proposed methods on both two- and three-locus disease models. Our method has discovered many novel high-order associations that are significantly enriched in cases from two real GWAS datasets. Moreover, the running time of the cloud implementation for our method on AMD dataset and RA dataset are roughly 2 hours and 50 hours on a cluster with forty small virtual machines for detecting two-locus interactions, respectively. Therefore, we believe that our method is suitable and effective for the full-scale analysis of multiple-locus epistatic interactions in GWAS.

Keywords:
Cloud computing; Genome-wide association studies; Dynamic clustering