This article is part of the supplement: Genetic Analysis Workshop 17: Unraveling Human Exome Data
Genetic Analysis Workshop 17 mini-exome simulation
1 Department of Genetics, Texas Biomedical Research Institute, 7620 NW Loop 410, San Antonio, TX 78245, USA
2 Centro de Investigación en Biología Molecular y Celular, Universidad de Costa Rica, San José, Costa Rica
3 Menzies Research Institute, 17 Liverpool St (Private Bag 23), Hobart, Tasmania 7001, Australia
BMC Proceedings 2011, 5(Suppl 9):S2 doi:10.1186/1753-6561-5-S9-S2Published: 29 November 2011
The data set simulated for Genetic Analysis Workshop 17 was designed to mimic a subset of data that might be produced in a full exome screen for a complex disorder and related risk factors in order to permit workshop participants to investigate issues of study design and statistical genetic analysis. Real sequence data from the 1000 Genomes Project formed the basis for simulating a common disease trait with a prevalence of 30% and three related quantitative risk factors in a sample of 697 unrelated individuals and a second sample of 697 individuals in large, extended pedigrees. Called genotypes for 24,487 autosomal markers assigned to 3,205 genes and simulated affection status, quantitative traits, age, sex, pedigree relationships, and cigarette smoking were provided to workshop participants. The simulating model included both common and rare variants with minor allele frequencies ranging from 0.07% to 25.8% and a wide range of effect sizes for these variants. Genotype-smoking interaction effects were included for variants in one gene. Functional variants were concentrated in genes selected from specific biological pathways and were selected on the basis of the predicted deleteriousness of the coding change. For each sample, unrelated individuals and family, 200 replicates of the phenotypes were simulated.