Polymorphism Interaction Analysis (PIA): a method for investigating complex gene-gene interactions
1 Laboratory of Human Carcinogenesis, National Cancer Institute, NIH, Bethesda, MD, USA
2 SAIC-Frederick, Advanced Biomedical Computing Center, National Cancer Institute, NIH, Frederick, MD, USA
3 Gradient, Inc.; Boston, MA, USA
4 Pediatric Oncology Branch, National Cancer Institute, Advanced Technology Center, Gaithersburg, MD, USA
BMC Bioinformatics 2008, 9:146 doi:10.1186/1471-2105-9-146Published: 6 March 2008
The risk of common diseases is likely determined by the complex interplay between environmental and genetic factors, including single nucleotide polymorphisms (SNPs). Traditional methods of data analysis are poorly suited for detecting complex interactions due to sparseness of data in high dimensions, which often occurs when data are available for a large number of SNPs for a relatively small number of samples. Validation of associations observed using multiple methods should be implemented to minimize likelihood of false-positive associations. Moreover, high-throughput genotyping methods allow investigators to genotype thousands of SNPs at one time. Investigating associations for each individual SNP or interactions between SNPs using traditional approaches is inefficient and prone to false positives.
We developed the Polymorphism Interaction Analysis tool (PIA version 2.0) to include different approaches for ranking and scoring SNP combinations, to account for imbalances between case and control ratios, stratify on particular factors, and examine associations of user-defined pathways (based on SNP or gene) with case status. PIA v. 2.0 detected 2-SNP interactions as the highest ranking model 77% of the time, using simulated data sets of genetic models of interaction (minor allele frequency = 0.2; heritability = 0.01; N = 1600) generated previously [Velez DR, White BC, Motsinger AA, Bush WS, Ritchie MD, Williams SM, Moore JH: A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction. Genet Epidemiol 2007, 31:306–315.]. Interacting SNPs were detected in both balanced (20 SNPs) and imbalanced data (case:control 1:2 and 1:4, 10 SNPs) in the context of non-interacting SNPs.
PIA v. 2.0 is a useful tool for exploring gene*gene or gene*environment interactions and identifying a small number of putative associations which may be investigated further using other statistical methods and in replication study populations.