A comparative study on gene-set analysis methods for assessing differential expression associated with the survival phenotype
- Equal contributors
1 Department of Mathematics and Statistics, Sejong University, Seoul, 143-747, Korea
2 Department of Applied Statistics, University of Suwon, Hwaseong, 445-743, Korea
BMC Bioinformatics 2011, 12:377 doi:10.1186/1471-2105-12-377Published: 26 September 2011
Many gene-set analysis methods have been previously proposed and compared through simulation studies and analysis of real datasets for binary phenotypes. We focused on the survival phenotype and compared the performances of Gene Set Enrichment Analysis (GSEA), Global Test (GT), Wald-type Test (WT) and Global Boost Test (GBST) methods in a simulation study and on two ovarian cancer data sets. We considered two versions of GSEA by allowing different weights: GSEA1 uses equal weights, yielding results similar to the Kolmogorov-Smirnov test; while GSEA2's weights are based on the correlation between genes and the phenotype.
We compared GSEA1, GSEA2, GT, WT and GBST in a simulation study with various settings for the correlation structure of the genes and the association parameter between the survival outcome and the genes. Simulation results indicated that GT, WT and GBST consistently have higher power than GSEA1 and GSEA2 across all scenarios. However, the power of the five tests depends on the combination of correlation structure and association parameter. For the ovarian cancer data set, using the FDR threshold of q < 0.1, the GT, WT and GBST detected 12, 6 and 8 significant pathways, respectively, whereas neither GSEA1 nor GSEA2 detected any significant pathways. In addition, among the pathways found significant by GT, WT, and GBST, three pathways - Purine metabolism, Leukocyte transendothelial migration and Jak-STAT signaling pathway - overlapped with those reported in previous ovarian cancer microarray studies.
Simulation studies and a real data example indicate that GT, WT and GBST tend to have high power, whereas GSEA1 and GSEA2 have lower power. We also found that the power of the five tests is much higher when genes are correlated than when genes are independent, when survival is positively associated with genes. It seems that there is a synergistic effect in detecting significant gene sets when significant genes have within-class correlation and the association between survival and genes is positive or negative (i.e., one-direction correlation).