This article is part of the supplement: Genetic Analysis Workshop 17: Unraveling Human Exome Data
Identifying influential regions in extremely rare variants using a fixed-bin approach
1 Department of Statistics, Columbia University, 1255 Amsterdam Avenue, Room 1005, MC 4690, New York, NY 10027, USA
2 Department of Information Systems, Business Statistics and Operations Management, Hong Kong University of Science and Technology Business School, Clear Water Bay, Kowloon, Hong Kong
BMC Proceedings 2011, 5(Suppl 9):S3 doi:10.1186/1753-6561-5-S9-S3Published: 29 November 2011
In this study, we analyze the Genetic Analysis Workshop 17 data to identify regions of single-nucleotide polymorphisms (SNPs) that exhibit a significant influence on response rate (proportion of subjects with an affirmative affected status), called the affected ratio, among rare variants. Under the null hypothesis, the distribution of rare variants is assumed to be uniform over case (affected) and control (unaffected) subjects. We attempt to pinpoint regions where the composition is significantly different between case and control events, specifically where there are unusually high numbers of rare variants among affected subjects. We focus on private variants, which require a degree of “collapsing” to combine information over several SNPs, to obtain meaningful results. Instead of implementing a gene-based approach, where regions would vary in size and sometimes be too small to achieve a strong enough signal, we implement a fixed-bin approach, with a preset number of SNPs per region, relying on the assumption that proximity and similarity go hand in hand. Through application of 100-SNP and 30-SNP fixed bins, we identify several most influential regions, which later are seen to contain some of the causal SNPs. The 100- and 30-SNP approaches detected seven and three causal SNPs among the most significant regions, respectively, with two overlapping SNPs located in the ELAVL4 gene, reported by both procedures.