Email updates

Keep up to date with the latest news and content from BMC Proceedings and BioMed Central.

This article is part of the supplement: Genetic Analysis Workshop 17: Unraveling Human Exome Data

Open Access Proceedings

Identifying influential regions in extremely rare variants using a fixed-bin approach

Michael Agne1*, Chien-Hsun Huang1, Inchi Hu2, Haitian Wang2, Tian Zheng1 and Shaw-Hwa Lo1*

Author Affiliations

1 Department of Statistics, Columbia University, 1255 Amsterdam Avenue, Room 1005, MC 4690, New York, NY 10027, USA

2 Department of Information Systems, Business Statistics and Operations Management, Hong Kong University of Science and Technology Business School, Clear Water Bay, Kowloon, Hong Kong

For all author emails, please log on.

BMC Proceedings 2011, 5(Suppl 9):S3  doi:10.1186/1753-6561-5-S9-S3

Published: 29 November 2011


In this study, we analyze the Genetic Analysis Workshop 17 data to identify regions of single-nucleotide polymorphisms (SNPs) that exhibit a significant influence on response rate (proportion of subjects with an affirmative affected status), called the affected ratio, among rare variants. Under the null hypothesis, the distribution of rare variants is assumed to be uniform over case (affected) and control (unaffected) subjects. We attempt to pinpoint regions where the composition is significantly different between case and control events, specifically where there are unusually high numbers of rare variants among affected subjects. We focus on private variants, which require a degree of “collapsing” to combine information over several SNPs, to obtain meaningful results. Instead of implementing a gene-based approach, where regions would vary in size and sometimes be too small to achieve a strong enough signal, we implement a fixed-bin approach, with a preset number of SNPs per region, relying on the assumption that proximity and similarity go hand in hand. Through application of 100-SNP and 30-SNP fixed bins, we identify several most influential regions, which later are seen to contain some of the causal SNPs. The 100- and 30-SNP approaches detected seven and three causal SNPs among the most significant regions, respectively, with two overlapping SNPs located in the ELAVL4 gene, reported by both procedures.