Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Methodology article

A genetic ensemble approach for gene-gene interaction identification

Pengyi Yang123*, Joshua WK Ho13, Albert Y Zomaya14 and Bing B Zhou14*

Author Affiliations

1 School of Information Technologies, University of Sydney, NSW 2006, Australia

2 School of Mathematics and Statistics, University of Sydney, NSW 2006, Australia

3 NICTA, Australian Technology Park, Eveleigh, NSW 2015, Australia

4 Centre for Distributed and High Performance Computing, University of Sydney, NSW 2006, Australia

For all author emails, please log on.

BMC Bioinformatics 2010, 11:524  doi:10.1186/1471-2105-11-524

Published: 21 October 2010

Abstract

Background

It has now become clear that gene-gene interactions and gene-environment interactions are ubiquitous and fundamental mechanisms for the development of complex diseases. Though a considerable effort has been put into developing statistical models and algorithmic strategies for identifying such interactions, the accurate identification of those genetic interactions has been proven to be very challenging.

Methods

In this paper, we propose a new approach for identifying such gene-gene and gene-environment interactions underlying complex diseases. This is a hybrid algorithm and it combines genetic algorithm (GA) and an ensemble of classifiers (called genetic ensemble). Using this approach, the original problem of SNP interaction identification is converted into a data mining problem of combinatorial feature selection. By collecting various single nucleotide polymorphisms (SNP) subsets as well as environmental factors generated in multiple GA runs, patterns of gene-gene and gene-environment interactions can be extracted using a simple combinatorial ranking method. Also considered in this study is the idea of combining identification results obtained from multiple algorithms. A novel formula based on pairwise double fault is designed to quantify the degree of complementarity.

Conclusions

Our simulation study demonstrates that the proposed genetic ensemble algorithm has comparable identification power to Multifactor Dimensionality Reduction (MDR) and is slightly better than Polymorphism Interaction Analysis (PIA), which are the two most popular methods for gene-gene interaction identification. More importantly, the identification results generated by using our genetic ensemble algorithm are highly complementary to those obtained by PIA and MDR. Experimental results from our simulation studies and real world data application also confirm the effectiveness of the proposed genetic ensemble algorithm, as well as the potential benefits of combining identification results from different algorithms.