Genome-wide association study combined with biological context can reveal more disease-related SNPs altering microRNA target seed sites
BMC Genomics 2014, 15:669 doi:10.1186/1471-2164-15-669Published: 8 August 2014
Emerging studies demonstrate that single nucleotide polymorphisms (SNPs) resided in the microRNA recognition element seed sites (MRESSs) in 3[prime]UTR of mRNAs are putative biomarkers for human diseases and cancers. However, exhaustively experimental validation for the causality of MRESS SNPs is impractical. Therefore bioinformatics have been introduced to predict causal MRESS SNPs. Genome-wide association study (GWAS) provides a way to detect susceptibility of millions of SNPs simultaneously by taking linkage disequilibrium (LD) into account, but the multiple-testing corrections implemented to suppress false positive rate always sacrificed the sensitivity. In our study, we proposed a method to identify candidate causal MRESS SNPs from 12 GWAS datasets without performing multiple-testing corrections. Alternatively, we used biological context to ensure credibility of the selected SNPs.
In 11 out of the 12 GWAS datasets, MRESS SNPs were over-represented in SNPs with p-value <= 0.05 (odds ratio (OR) ranged from 1.1 to 2.4). Moreover, host genes of susceptible MRESS SNPs in each of the 11 GWAS dataset shared biological context with reported causal genes. There were 286 MRESS SNPs identified by our method, while only 13 SNPs were identified by multiple-testing corrections with a given threshold of 1 x 10-5, which is a common cutoff used in GWAS. 27 out of the 286 candidate SNPs have been reported to be deleterious while only 2 out of 13 multiple-testing corrected SNPs were documented in PubMed. MicroRNA-mRNA interactions affected by the 286 candidate SNPs were likely to present negatively correlated expression. These SNPs introduced greater alternation of binding free energy than other MRESS SNPs, especially when grouping by haplotypes (4210 vs. 4105 cal/mol by mean, 9781 vs. 8521 cal/mol by mean, respectively).
MRESS SNPs are promising disease biomarkers in multiple GWAS datasets. The method of integrating GWAS p-value and biological context is stable and effective for selecting candidate causal MRESS SNPs, it reduces the loss of sensitivity compared to multiple-testing corrections. The 286 candidate causal MRESS SNPs provide researchers a credible source to initialize their design of experimental validations in the future.