This article is part of the supplement: Genetic Analysis Workshop 17: Unraveling Human Exome Data
Comparison of scoring methods for the detection of causal genes with or without rare variants
1 Institute for Medical Informatics, Statistics, and Epidemiology (IMISE), Universität Leipzig, Härtelstrasse 16-18, 04107 Leipzig, Germany
2 LIFE Center (Leipzig Interdisciplinary Research Cluster of Genetic Factors, Phenotypes, and Environment), Universität Leipzig, Leipzig, Philipp-Rosenthal-Strasse 27, 04103 Leipzig, Germany
3 Translational Center for Regenerative Medicine, Universität Leipzig, Philipp-Rosenthal-Strasse 55, 04103 Leipzig, Germany
4 Department for Cell Therapy, Fraunhofer Institute for Cell Therapy and Immunology, Perlickstrasse 1, 04103 Leipzig, Germany
BMC Proceedings 2011, 5(Suppl 9):S49 doi:10.1186/1753-6561-5-S9-S49Published: 29 November 2011
Rare causal variants are believed to significantly contribute to the genetic basis of common diseases or quantitative traits. Appropriate statistical methods are required to discover the highest possible number of disease-relevant variants in a genome-wide screening study. The publicly available Genetic Analysis Workshop 17 data set consists of 697 individuals and 24,487 genetic variants. It includes a simulated complex disease model with intermediate quantitative phenotypes. We compare four gene-wise scoring methods with respect to ranking of causal genes under variable allele frequency thresholds for collapsing of rare variants and considering whether or not rare variants were included. We also compare causal genes for which the ranks differ clearly between scoring methods regarding such characteristics as number and strength of causal variants. We corroborated our findings with additional simulations. We found that the maximum statistics method was superior in assigning high ranks to genes with a single strong causal variant. Hotelling’s T2 test was superior for genes with several independent causal variants. This was consistent for all phenotypes and was confirmed by single-gene analyses and additional simulations. The multivariate analysis performed similarly to Hotelling’s T2 test. The least absolute shrinkage and selection operator (LASSO) analysis was widely comparable with the maximum statistics method. We conclude that the maximum statistics method is a superior alternative to Hotelling’s T2 test if one expects only one independent causal variant per gene with a dominating effect. Such a variant could also be a supermarker derived by collapsing rare variants. Because the true nature of the genetic effect is unknown for real data, both methods need to be taken into consideration.