Email updates

Keep up to date with the latest news and content from BMC Proceedings and BioMed Central.

This article is part of the supplement: Genetic Analysis Workshop 17: Unraveling Human Exome Data

Open Access Proceedings

Genetic Analysis Workshop 17 mini-exome simulation

Laura Almasy1*, Thomas D Dyer1, Juan Manuel Peralta2, Jack W Kent1, Jac C Charlesworth3, Joanne E Curran1 and John Blangero1

Author Affiliations

1 Department of Genetics, Texas Biomedical Research Institute, 7620 NW Loop 410, San Antonio, TX 78245, USA

2 Centro de Investigación en Biología Molecular y Celular, Universidad de Costa Rica, San José, Costa Rica

3 Menzies Research Institute, 17 Liverpool St (Private Bag 23), Hobart, Tasmania 7001, Australia

For all author emails, please log on.

BMC Proceedings 2011, 5(Suppl 9):S2  doi:10.1186/1753-6561-5-S9-S2

Published: 29 November 2011

Abstract

The data set simulated for Genetic Analysis Workshop 17 was designed to mimic a subset of data that might be produced in a full exome screen for a complex disorder and related risk factors in order to permit workshop participants to investigate issues of study design and statistical genetic analysis. Real sequence data from the 1000 Genomes Project formed the basis for simulating a common disease trait with a prevalence of 30% and three related quantitative risk factors in a sample of 697 unrelated individuals and a second sample of 697 individuals in large, extended pedigrees. Called genotypes for 24,487 autosomal markers assigned to 3,205 genes and simulated affection status, quantitative traits, age, sex, pedigree relationships, and cigarette smoking were provided to workshop participants. The simulating model included both common and rare variants with minor allele frequencies ranging from 0.07% to 25.8% and a wide range of effect sizes for these variants. Genotype-smoking interaction effects were included for variants in one gene. Functional variants were concentrated in genes selected from specific biological pathways and were selected on the basis of the predicted deleteriousness of the coding change. For each sample, unrelated individuals and family, 200 replicates of the phenotypes were simulated.