Email updates

Keep up to date with the latest news and content from BMC Proceedings and BioMed Central.

This article is part of the supplement: Genetic Analysis Workshop 17: Unraveling Human Exome Data

Open Access Proceedings

Rare variant collapsing in conjunction with mean log p-value and gradient boosting approaches applied to Genetic Analysis Workshop 17 data

Yauheniya Cherkas1, Nandini Raghavan2, Stephan Francke3, Frank DeFalco4 and Marsha A Wilcox1*

Author Affiliations

1 Epidemiology, Johnson & Johnson, 1125 Trenton-Harbourton Road, Titusville, NJ 08560, USA

2 Non-Clinical Biostatistics, Johnson & Johnson, OMP Building, 1000 Route 202-S, Raritan, NJ 08869, USA

3 Pharmacogenomics, Johnson & Johnson PRD, PO Box 300, 1000 Route 202, Raritan, NJ 08869, USA

4 Informatics, Johnson & Johnson, 920 Route 202, Raritan, NJ 08869, USA

For all author emails, please log on.

BMC Proceedings 2011, 5(Suppl 9):S94  doi:10.1186/1753-6561-5-S9-S94

Published: 29 November 2011


In addition to methods that can identify common variants associated with susceptibility to common diseases, there has been increasing interest in approaches that can identify rare genetic variants. We use the simulated data provided to the participants of Genetic Analysis Workshop 17 (GAW17) to identify both rare and common single-nucleotide polymorphisms and pathways associated with disease status. We apply a rare variant collapsing approach and the usual association tests for common variants to identify candidates for further analysis using pathway-based and tree-based ensemble approaches. We use the mean log p-value approach to identify a top set of pathways and compare it to those used in simulation of GAW17 dataset. We conclude that the mean log p-value approach is able to identify those pathways in the top list and also related pathways. We also use the stochastic gradient boosting approach for the selected subset of single-nucleotide polymorphisms. When compared the result of this tree-based method with the list of single-nucleotide polymorphisms used in dataset simulation, in addition to correct SNPs we observe number of false positives.