RS-SNP: a random-set method for genome-wide association studies
1 Istituto di Studi sui Sistemi Intelligenti per l'Automazione - CNR, Via Amendola 122/D-I, 70126 Bari, Italy
2 Ospedale "Casa Sollievo della Sofferenza" IRCCS, Laboratorio di Gastroenterologia, Foggia, Italy
3 Departments of Statistical Science, Computer Science, Mathematics, Institute for Genome Sciences & Policy, Duke University, Durham, NC, USA
Citation and License
BMC Genomics 2011, 12:166 doi:10.1186/1471-2164-12-166Published: 30 March 2011
The typical objective of Genome-wide association (GWA) studies is to identify single-nucleotide polymorphisms (SNPs) and corresponding genes with the strongest evidence of association (the 'most-significant SNPs/genes' approach). Borrowing ideas from micro-array data analysis, we propose a new method, named RS-SNP, for detecting sets of genes enriched in SNPs moderately associated to the phenotype. RS-SNP assesses whether the number of significant SNPs, with p-value P ≤ α, belonging to a given SNP set is statistically significant. The rationale of proposed method is that two kinds of null hypotheses are taken into account simultaneously. In the first null model the genotype and the phenotype are assumed to be independent random variables and the null distribution is the probability of the number of significant SNPs in greater than observed by chance. The second null model assumes the number of significant SNPs in depends on the size of and not on the identity of the SNPs in . Statistical significance is assessed using non-parametric permutation tests.
We applied RS-SNP to the Crohn's disease (CD) data set collected by the Wellcome Trust Case Control Consortium (WTCCC) and compared the results with GENGEN, an approach recently proposed in literature. The enrichment analysis using RS-SNP and the set of pathways contained in the MSigDB C2 CP pathway collection highlighted 86 pathways rich in SNPs weakly associated to CD. Of these, 47 were also indicated to be significant by GENGEN. Similar results were obtained using the MSigDB C5 pathway collection. Many of the pathways found to be enriched by RS-SNP have a well-known connection to CD and often with inflammatory diseases.
The proposed method is a valuable alternative to other techniques for enrichment analysis of SNP sets. It is well founded from a theoretical and statistical perspective. Moreover, the experimental comparison with GENGEN highlights that it is more robust with respect to false positive findings.