Email updates

Keep up to date with the latest news and content from BMC Systems Biology and BioMed Central.

This article is part of the supplement: 22nd International Conference on Genome Informatics: Systems Biology

Open Access Proceedings

SNP-PRAGE: SNP-based parametric robust analysis of gene set enrichment

Jaehoon Lee1, Soyeon Ahn2, Sohee Oh1, Bruce Weir3 and Taesung Park1*

Author Affiliations

1 Department of Statistics, Seoul National University, San 56-1, Shilim-dong, Seoul, Korea

2 Medical Research Collaborating Center, Seoul National University Bundang Hospital, 166 Gumi-ro, Bundang-gu, Seongnam 463-707, Korea

3 Department of Biostatistics, University of Washington, Box 357232, Seattle, Washington 98195, USA

For all author emails, please log on.

BMC Systems Biology 2011, 5(Suppl 2):S11  doi:10.1186/1752-0509-5-S2-S11

Published: 14 December 2011

Abstract

Background

The current genome-wide association (GWA) analysis mainly focuses on the single genetic variant, which may not reveal some the genetic variants that have small individual effects but large joint effects. Considering the multiple SNPs jointly in Genome-wide association (GWA) analysis can increase power. When multiple SNPs are jointly considered, the corresponding SNP-level association measures are likely to be correlated due to the linkage disequilibrium (LD) among SNPs.

Methods

We propose SNP-based parametric robust analysis of gene-set enrichment (SNP-PRAGE) method which handles correlation adequately among association measures of SNPs, and minimizes computing effort by the parametric assumption. SNP-PRAGE first obtains gene-level association measures from SNP-level association measures by incorporating the size of corresponding (or nearby) genes and the LD structure among SNPs. Afterward, SNP-PRAGE acquires the gene-set level summary of genes that undergo the same biological knowledge. This two-step summarization makes the within-set association measures to be independent from each other, and therefore the central limit theorem can be adequately applied for the parametric model.

Results & conclusions

We applied SNP-PRAGE to two GWA data sets: hypertension data of 8,842 samples from the Korean population and bipolar disorder data of 4,806 samples from the Wellcome Trust Case Control Consortium (WTCCC). We found two enriched gene sets for hypertension and three enriched gene sets for bipolar disorder. By a simulation study, we compared our method to other gene set methods, and we found SNP-PRAGE reduced many false positives notably while requiring much less computational efforts than other permutation-based gene set approaches.