This article is part of the supplement: Genetic Analysis Workshop 16
Comparing gene set analysis methods on single-nucleotide polymorphism data from Genetic Analysis Workshop 16
1 Department of Mathematics, Hope College, 27 Graves Place, Holland, Michigan 49423, USA
2 Department of Mathematics, Rose-Hulman Institute of Technology, 5500 Wabash Avenue, Terre Haute, Indiana 47803, USA
3 Department of Mathematics, Seattle Pacific University, 3307 Third Avenue West, Seattle, Washington 98119, USA
BMC Proceedings 2009, 3(Suppl 7):S96 doi:Published: 15 December 2009
Recently, gene set analysis (GSA) has been extended from use on gene expression data to use on single-nucleotide polymorphism (SNP) data in genome-wide association studies. When GSA has been demonstrated on SNP data, two popular statistics from gene expression data analysis (gene set enrichment analysis [GSEA] and Fisher's exact test [FET]) have been used. However, GSEA and FET have shown a lack of power and robustness in the analysis of gene expression data. The purpose of this work is to investigate whether the same issues are also true for the analysis of SNP data. Ultimately, we conclude that GSEA and FET are not optimal for the analysis of SNP data when compared with the SUMSTAT method. In analysis of real SNP data from the Framingham Heart Study, we find that SUMSTAT finds many more gene sets to be significant when compared with other methods. In an analysis of simulated data, SUMSTAT demonstrates high power and better control of the type I error rate. GSA is a promising approach to the analysis of SNP data in GWAS and use of the SUMSTAT statistic instead of GSEA or FET may increase power and robustness.