SUP: an extension to SLINK to allow a larger number of marker loci to be simulated in pedigrees conditional on trait values
McGill University and Genome Quebec Innovation Centre, Montreal, Canada
BMC Genetics 2006, 7:40 doi:10.1186/1471-2156-7-40Published: 3 July 2006
With the recent advances in high-throughput genotyping technologies that allow for large-scale association mapping of human complex traits, promising statistical designs and methods have been emerging. Efficient simulation software are key elements for the evaluation of the properties of new statistical tests. SLINK is a flexible simulation tool that has been widely used to generate the segregation and recombination processes of markers linked to, and possibly associated with, a trait locus, conditional on trait values in arbitrary pedigrees. In practice, its most serious limitation is the small number of loci that can be simulated, since the complexity of the algorithm scales exponentially with this number.
I describe the implementation of a two-step algorithm to be used in conjunction with SLINK to enable the simulation of a large number of marker loci linked to a trait locus and conditional on trait values in families, with the possibility for the loci to be in linkage disequilibrium. SLINK is used in the first step to simulate genotypes at the trait locus conditional on the observed trait values, and also to generate an indicator of the descent path of the simulated alleles. In the second step, marker alleles or haplotypes are generated in the founders, conditional on the trait locus genotypes simulated in the first step. Then the recombination process between the marker loci takes place conditionally on the descent path and on the trait locus genotypes. This two-step implementation is often computationally faster than other software that are designed to generate marker data linked to, and possibly associated with, a trait locus.
Because the proposed method uses SLINK to simulate the segregation process, it benefits from its flexibility: the trait may be qualitative with the possibility of defining different liability classes (which allows for the simulation of gene-environment interactions or even the simulation of multi-locus effects between unlinked susceptibility regions) or it may be quantitative and normally distributed. In particular, this implementation is the only one available that can generate a large number of marker loci conditional on the set of observed quantitative trait values in pedigrees.