Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Research article

Stability SCAD: a powerful approach to detect interactions in large-scale genomic study

Jianwei Gou16, Yang Zhao1, Yongyue Wei1, Chen Wu2, Ruyang Zhang13, Yongyong Qiu1, Ping Zeng1, Wen Tan2, Dianke Yu2, Tangchun Wu4, Zhibin Hu135, Dongxin Lin2, Hongbing Shen135 and Feng Chen1*

Author Affiliations

1 Department of Epidemiology and Biostatistics and Ministry of Education (MOE) Key Lab for Modern Toxicology, School of Public Health, Nanjing Medical University, Nanjing, China

2 State Key Laboratory of Molecular Oncology and Department of Etiology and Carcinogenesis, Cancer Institute and Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

3 Section of Clinical Epidemiology, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Cancer Center, Nanjing Medical University, Nanjing, China

4 Institute of Occupational Medicine and Ministry of Education Key Laboratory for Environment and Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China

5 State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, China

6 Department of Mathematical and Statistical Sciences, Nanjing Forestry University, Nanjing, China

For all author emails, please log on.

BMC Bioinformatics 2014, 15:62  doi:10.1186/1471-2105-15-62

Published: 1 March 2014

Abstract

Background

Evidence suggests that common complex diseases may be partially due to SNP-SNP interactions, but such detection is yet to be fully established in a high-dimensional small-sample (small-n-large-p) study. A number of penalized regression techniques are gaining popularity within the statistical community, and are now being applied to detect interactions. These techniques tend to be over-fitting, and are prone to false positives. The recently developed stability least absolute shrinkage and selection operator (SLASSO) has been used to control family-wise error rate, but often at the expense of power (and thus false negative results).

Results

Here, we propose an alternative stability selection procedure known as stability smoothly clipped absolute deviation (SSCAD). Briefly, this method applies a smoothly clipped absolute deviation (SCAD) algorithm to multiple sub-samples, and then identifies cluster ensemble of interactions across the sub-samples. The proposed method was compared with SLASSO and two kinds of traditional penalized methods by intensive simulation. The simulation revealed higher power and lower false discovery rate (FDR) with SSCAD. An analysis using the new method on the previously published GWAS of lung cancer confirmed all significant interactions identified with SLASSO, and identified two additional interactions not reported with SLASSO analysis.

Conclusions

Based on the results obtained in this study, SSCAD presents to be a powerful procedure for the detection of SNP-SNP interactions in large-scale genomic data.

Keywords:
Genome-wide association study (GWAS); Interaction; Least absolute shrinkage and selection operator (LASSO); Penalized logistic regression; Smoothly clipped absolute deviation (SCAD); Stability selection