Email updates

Keep up to date with the latest news and content from BMC Proceedings and BioMed Central.

This article is part of the supplement: Genetic Analysis Workshop 17: Unraveling Human Exome Data

Open Access Proceedings

Identification of multiple rare variants associated with a disease

Jeesun Jung12*, Jessica Dantzer2 and Yunlong Liu123

Author Affiliations

1 Department of Medical and Molecular Genetics, Indiana University School of Medicine, IB 130, 975 West Walnut Street, Indianapolis, IN 46202, USA

2 Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, 410 West 10th Street, HS 5000, Indianapolis, IN 46202, USA

3 Center for Medical Genomics, Fairbanks Hall, Indiana University School of Medicine, 340 West 10th Street, Suite 6200, Indianapolis, IN 46202-3082, USA

For all author emails, please log on.

BMC Proceedings 2011, 5(Suppl 9):S103  doi:10.1186/1753-6561-5-S9-S103

Published: 29 November 2011


Identifying rare variants that are responsible for complex disease has been promoted by advances in sequencing technologies. However, statistical methods that can handle the vast amount of data generated and that can interpret the complicated relationship between disease and these variants have lagged. We apply a zero-inflated Poisson regression model to take into account the excess of zeros caused by the extremely low frequency of the 24,487 exonic variants in the Genetic Analysis Workshop 17 data. We grouped the 697 subjects in the data set as Europeans, Asians, and Africans based on principal components analysis and found the total number of rare variants per gene for each individual. We then analyzed these collapsed variants based on the assumption that rare variants are enriched in a group of people affected by a disease compared to a group of unaffected people. We also tested the hypothesis with quantitative traits Q1, Q2, and Q4. Analyses performed on the combined 697 individuals and on each ethnic group yielded different results. For the combined population analysis, we found that UGT1A1, which was not part of the simulation model, was associated with disease liability and that FLT1, which was a causal locus in the simulation model, was associated with Q1. Of the causal loci in the simulation models, FLT1 and KDR were associated with Q1 and VNN1 was correlated with Q2. No significant genes were associated with Q4. These results show the feasibility and capability of our new statistical model to detect multiple rare variants influencing disease risk.