Comparison of information-theoretic to statistical methods for gene-gene interactions in the presence of genetic heterogeneity
- Equal contributors
1 Department of Biostatistics, State University of New York, Buffalo, NY 14260, USA
2 Department of Cancer Prevention and Control, Roswell Park Cancer Institute, Buffalo New York, 14263, USA
3 Department of Computer Science and Engineering, State University of New York, Buffalo, NY 14260, USA
4 Department of Biostatistics, University of Toronto
5 Ontario Cancer Institute, Toronto, Ontario, Canada, USA
6 Department of Pharmaceutical Sciences, State University of New York, Buffalo, NY 14260, USA
BMC Genomics 2010, 11:487 doi:10.1186/1471-2164-11-487Published: 3 September 2010
Multifactorial diseases such as cancer and cardiovascular diseases are caused by the complex interplay between genes and environment. The detection of these interactions remains challenging due to computational limitations. Information theoretic approaches use computationally efficient directed search strategies and thus provide a feasible solution to this problem. However, the power of information theoretic methods for interaction analysis has not been systematically evaluated. In this work, we compare power and Type I error of an information-theoretic approach to existing interaction analysis methods.
The k-way interaction information (KWII) metric for identifying variable combinations involved in gene-gene interactions (GGI) was assessed using several simulated data sets under models of genetic heterogeneity driven by susceptibility increasing loci with varying allele frequency, penetrance values and heritability. The power and proportion of false positives of the KWII was compared to multifactor dimensionality reduction (MDR), restricted partitioning method (RPM) and logistic regression.
The power of the KWII was considerably greater than MDR on all six simulation models examined. For a given disease prevalence at high values of heritability, the power of both RPM and KWII was greater than 95%. For models with low heritability and/or genetic heterogeneity, the power of the KWII was consistently greater than RPM; the improvements in power for the KWII over RPM ranged from 4.7% to 14.2% at for α = 0.001 in the three models at the lowest heritability values examined. KWII performed similar to logistic regression.
Information theoretic models are flexible and have excellent power to detect GGI under a variety of conditions that characterize complex diseases.