This article is part of the supplement: Genetic Analysis Workshop 17: Unraveling Human Exome Data
Digging into the extremes: a useful approach for the analysis of rare variants with continuous traits?
Division of Genetic Epidemiology, Department of Medical Genetics, Molecular, and Clinical Pharmacology, Innsbruck Medical University, Schöpfstrasse 41, 6020 Innsbruck, Austria
BMC Proceedings 2011, 5(Suppl 9):S105 doi:10.1186/1753-6561-5-S9-S105Published: 29 November 2011
The common disease/rare variant hypothesis predicts that rare variants with large effects will have a strong impact on corresponding phenotypes. Therefore it is assumed that rare functional variants are enriched in the extremes of the phenotype distribution. In this analysis of the Genetic Analysis Workshop 17 data set, my aim is to detect genes with rare variants that are associated with quantitative traits using two general approaches: analyzing the association with the complete distribution of values by means of linear regression and using statistical tests based on the tails of the distribution (bottom 10% of values versus top 10%). Three methods are used for this extreme phenotype approach: Fisher’s exact test, weighted-sum method, and beta method. Rare variants were collapsed on the gene level. Linear regression including all values provided the highest power to detect rare variants. Of the three methods used in the extreme phenotype approach, the beta method performed best. Furthermore, the sample size was enriched in this approach by adding additional samples with extreme phenotype values. Doubling the sample size using this approach, which corresponds to only 40% of sample size of the original continuous trait, yielded a comparable or even higher power than linear regression. If samples are selected primarily for sequencing, enriching the analysis by gathering a greater proportion of individuals with extreme values in the phenotype of interest rather than in the general population leads to a higher power to detect rare variants compared to analyzing a population-based sample with equivalent sample size.