Abstract
Background
Complex diseases are multifactorial traits caused by both genetic and environmental factors. They represent the major part of human diseases and include those with largest prevalence and mortality (cancer, heart disease, obesity, etc.). Despite a large amount of information that has been collected about both genetic and environmental risk factors, there are few examples of studies on their interactions in epidemiological literature. One reason can be the incomplete knowledge of the power of statistical methods designed to search for risk factors and their interactions in these data sets. An improvement in this direction would lead to a better understanding and description of geneenvironment interactions. To this aim, a possible strategy is to challenge the different statistical methods against data sets where the underlying phenomenon is completely known and fully controllable, for example simulated ones.
Results
We present a mathematical approach that models geneenvironment interactions. By this method it is possible to generate simulated populations having geneenvironment interactions of any form, involving any number of genetic and environmental factors and also allowing nonlinear interactions as epistasis. In particular, we implemented a simple version of this model in a GeneEnvironment iNteraction Simulator (GENS), a tool designed to simulate casecontrol data sets where a one geneone environment interaction influences the disease risk. The main aim has been to allow the input of population characteristics by using standard epidemiological measures and to implement constraints to make the simulator behaviour biologically meaningful.
Conclusions
By the multilogistic model implemented in GENS it is possible to simulate casecontrol samples of complex disease where geneenvironment interactions influence the disease risk. The user has full control of the main characteristics of the simulated population and a Monte Carlo process allows random variability. A knowledgebased approach reduces the complexity of the mathematical model by using reasonable biological constraints and makes the simulation more understandable in biological terms. Simulated data sets can be used for the assessment of novel statistical methods or for the evaluation of the statistical power when designing a study.
Background
Complex Diseases (CD) are caused by variations in multiple loci interacting with each other and with environmental factors [1]. Many complex traits, such as cancer, heart disease, obesity, diabetes, and many common psychiatric and neurological conditions, have large prevalence and mortality among human diseases [2,3].
The concept of GeneEnvironment interaction (GxE) is theoretically central in CD [4]. It is widely accepted that GxE must be considered in CD to avoid a serious underestimation of the disease risk and inconsistencies of replication among different studies. Furthermore, taking into account the GxE could focus medical intervention by identifying subgroups of individuals who are more susceptible to specific environmental exposures [5]. However, there are very few examples of well described GxE in scientific literature [6]. Instead, a large amount of information has been collected about both single genetic and environmental risk factors individually taken, because the majority of the studies examined the main effect of single factors instead of examining the interactions [68].
In our opinion, one reason for such a failure could be the statistical approach. Several statistical methods aimed at the identification of factors' interactions have been described and used to identify GxE, such as Logistic Regression [9] and Multifactor Dimensionality Reduction (MDR) [10,11]. However, the performances of these methods can be influenced by many variables such as the sample size, the number of involved factors, the type of interaction, the model of inheritance, the allelic frequencies, the distributions of the environmental factors, and the relative strength of the different factors affecting the risk of disease. Unfortunately, only some of these characteristics are known in few real populations, and therefore there is not enough information to assess the performances of statistical methods.
In this scenario, as an alternative approach, one can imagine using simulated populations in order to assess the statistical power of different methods. In population genetics, although there are several genetic data simulators (for a complete list see [12]), the vast majority have been developed to study the evolution of genomic sequences across generations (as coalescent [13] and forwardtime methods [14]; for a review see [15]). Beside these tools, many others that simulate pedigrees also have been developed. They help the linkage analysis in familiar pedigrees and, hence, are useful mainly in mapping loci involved in mendelian diseases [1622].
Regarding the modelling of the genetics role in common multifactorial complex diseases, to date, few models have been developed. The "GWAsimulator" was developed mainly to simulate pattern of linkage disequilibrium (LD) among SNPs in genomewide studies [23]. GWAsimulator does not consider any role of the environment on the risk of disease. On the contrary, the modelling of environmental factors effect on the risk of disease is a very large field of epidemiology [24]. However, it is generally accepted that the effect of an environmental exposure on the disease risk can follow a logistic function. Indeed, the most used statistical tool for environmental factor is the logistic regression.
Among the others, two software, SIMLA [17] and QUANTO [25] are specifically designed for data sets where the disease risk is a function of interactions between genetic and environmental factors. In both models, the disease risk is based on a logistic function, where covariates are genetic factors, environmental factors and interactions. In SIMLA the data of three generations of families are simulated and the disease risk is a function of up to two genetic and two environmental factors. The user can input the relative risk associated to single factors and also combinations of any two factors. QUANTO is a tool designed to estimate the power of matched casecontrol, casesib, or caseparent studies and does not actually produce simulated data sets. In QUANTO the disease risk is a function of a one geneone environment interaction. Moreover, in QUANTO the user can input the risks associated to the environmental factor, to the genetic factor and to their interaction. SIMLA and QUANTO are valuable tools for the modelling of complex diseases, because they explicitly consider the role of GxE in disease risk. However, some limitations still exist. For example, in SIMLA it is not straightforward to simulate data of unrelated individuals as those of casecontrol data sets. Furthermore, the user inputs the risk associated to each factor and to each interaction of factors. In this way, after the building up of the logistic model the marginal risks that result for each single factor are not the same as those input previously. This latter can be a limitation when simulating a real dataset where only marginal risks of single factors are known, and nothing known about their relationships. Finally, these tools can describe the interactions between genetic and environmental factors only in a linear way and they are not easily extensible to more complex interactions.
We propose a novel method, the MultiLogistic Model, that mathematically describes geneenvironment interactions that are similar to those found in casecontrol studies. By this method it is possible to model GxE in any form, involving any number of genetic and environmental factors, also allowing genegene interactions, as epistasis. A simple version has been implemented in the GeneEnvironment iNteraction Simulator (GENS), designed to simulate casecontrol data sets where a one geneone environment interaction influences the disease risk. Moreover, to make easier the simulation of data nearer to those from previous studies or literature we used common epidemiological measures as input. This also makes the tool friendlier to the biomedical community.
Results
The MultiLogistic Model for geneenvironment interaction
The mathematical approach behind the simulation of the disease risk involving GxE is based on a system of logistic relationships. We called this approach MultiLogistic Model (MLM) and specifically designed it to describe disease risk in data sets that simulate casecontrol samples. In the simulated data sets, each individual has G genetic factors and is exposed to E environmental factors. Genetic factors are denoted by where a = 1, ..., G. The genetic factors are biallelic Single Nucleotide Polymorphisms which result in three diploid genotypes, namely the first homozygote (AA, i_{a }= 1), the heterozygote (Aa, i_{a }= 2) and the second homozygote (aa, i_{a }= 3). Genetic frequencies for each factor are denoted by P^{G}() where ∀a . The environmental variables, instead, are denoted by , where b = 1, ..., E and j_{b }is an index which runs over the possible discretized values of the variable b. They are characterized by exposure probabilities denoted by P^{E}() (where again ∀b ). It is worth noticing we preferred to present the mathematical description concerning a discrete environmental variable only in order to keep it simpler. However, the model is more general and can be referred to as both continuous or discrete variables.
Let us consider a particular individual characterized by (E + G) values of and . In general the disease risk R is a function of all of them. The disease risk for such an individual () is defined by the conditioned probability
where P (affected ) is the probability of the individual to be affected. In our model we assume a logistic expression for R:
where and are free parameters determined by the genetic factors and governing the shape of the function. Figure 1 shows an example of the model in case of 2 genetic and 1 environmental factor interacting.
Figure 1. Multilogistic model applied to a two geneticone environmental factors condition. On the yaxis is reported the disease risk (R) and on the xaxis is reported the level of exposure of the environmental factor. The relationship is modelled by the Eq. 2. For each combination of genetic factors there are different α_{i }and β_{i }that models the relationship between environmental exposure and disease risk.
GeneEnvironment iNteraction Simulator
We implemented the MLM in the GeneEnvironment iNteraction Simulator (GENS). For the sake of simplicity we describe, in this phase, a simple interaction between one genetic and one environmental factor even though we continue to describe an individual by assigning to him a (E + G)tuple of characteristics. As a consequence of this choice, the MLM gets a simpler form. In particular, we can drop the indexes a and b in the expression of disease risk (2). Thus, by denoting with g_{i }the genotype of the chosen gene and with x_{j }the exposure level of the environmental factor involved, we have
In other words, the MLM reduces to three logistic functions, one for each genotype.
It is possible to think of α_{i }as the basal genetic disease risk in individuals with that genotype. The greater is α_{i }the stronger is the disease risk, independently of the contribution of the environmental factor. In particular, for vanishing α_{i }there is no basal risk and the risk is totally ascribed to the environmental exposure (x_{j}). Analogously, β_{i }represents the coefficient associated to the environmental exposure, thus the greater is β_{i }the greater risk is associated to an increasement in the environmental exposure. In other words, β_{i }models, for genotype i, the susceptibility to the environmental factor exposure. Consequently, for vanishing β_{i }the environmental exposure has no effect on the disease risk.
To describe populations by standard epidemiological measures, we implied the relative risk as the measure of the role of a genetic factor on the disease risk. In particular, by defining the Total Risk (T R) in a specific genotype i as
(which holds under the hypothesis of independence among different environmental variables) one can define the Relative Risk RR_{kl }≡ T R_{k}/T R_{l}.
We take one homozygote as a reference point (say AA, denoted with i = 1), the other homozygote (say aa, i = 3) has an equal or larger risk than the first one, and the heterozygote (Aa, i = 2) has a risk ranging within the two homozygotes. Furthermore, we assume the relative risk of heterozygote to be within those of the two homozygotes (1 ≤ RR_{21 }≤ RR_{31}). In particular, if the heterozygote risk is the same of the first homozygote a recessive effect is simulated. If the heterozygote has the same risk of the second homozygote a dominant effect is simulated. Other situations are called codominant.
Formally, the relative risk of heterozygote RR_{21 }is defined as
where the W allows to model various inheritance effects: recessive (W = 0), dominant (W = 1), and codominant (0 <RR_{21 }< 1) [17].
Marginal risk of the environmental factor is input as the odds ratio of the increase of one unit in the level of exposure. This value is then transformed in the coefficients β_{i }of the multilogistic model. Anyway, at most only one β_{i }is provided by the user, leaving the tool deriving other values to respect all the constraints.
Type of GxE interaction
To describe the GxE in biological understandable terms, we consider a genetic only and an environmental only model and two models of interactions that involve both genetic and environmental factors (Table 1 and Figure 2). The first two models could be useful as reference.
Table 1. Relationships among the coefficients of the MultiLogistic Model and the type of interaction.
Figure 2. Type of GxE interactions modeled by KAPS. On the yaxis is reported the disease risk (R) and on the xaxis is reported the level of exposure of the environmental factor. The relationship is modelled by the Eq. 3. For each combination of genetic factors there are different α_{i }and β_{i }that follows the specific constraints (Table 1). In the Environmental Model (EM), the disease risk is dependent only by the environmental exposure level, thus the environmentrisk relationship is the same across genotype (same slope and no shift). In the Genetic Model (GM), the disease risk depends on genetic factor only, thus the environment has no role on the disease risk (the curve is flat) while the risk is different across genotypes (height of the curve). In the third model (AM), the disease risk depends on both genetic and environmental factors; the relationship between environmental exposure and disease risk is the same in each genotype (same slope), but in each genotype there is a different basal risk (shift). In the fourth model (GEM), the genetic factor influences the relationship between environmental exposure and disease risk (slope). However, there is no different basal genetic risk (no shift).
In the first model, the Genetic Model (GM), each individual carrying a genotype has the same disease risk regardless of the environmental exposure. This situation is modelled by giving a vanishing effect to the environmental variable, namely fixing all the β_{i }equal to zero. In the second model, the Environmental Model (EM), the risk is due to the environmental exposure only. This situation is modelled by imposing α_{i }and β_{i }equal across the genotypes with a nonvanishing β_{i}. This choice provides the same risk independently of the carried genotype.
The third model simulates the scenario where the gene modules response to environment (Gene Environment interaction Model  GEM). In this case the genetics do not directly affect the disease risk, but modules the response to the environmental exposure. In other words, some genotypes are more prone than others to develop the disease if exposed to the same environmental level. In this interaction model all the α_{i }are equal (no direct genetic effect) while β_{i }are different. The last is the Additive Model (AM), where genetic and the environment influence the risk directly, independently and additively. Moreover the environmental exposure has the same effect in all the genotypes (equal β_{i}). For this model, there are no complex interactions between the genetics and the environmental exposure. However, the risk is the sum of that due to the genetic predisposition and that due to the environmental role. Of course the user can create further types of GxE by freely imposing α_{i }and β_{i}.
KnowledgeAided Parametrization System
To translate the population parameters into coefficients of the MLM, we implemented the KnowledgeAided Parametrization System (KAPS). This module derives the values of α_{i }and β_{i }starting from genotype frequencies, relative risk and model of inheritance of the genetic factor, distribution and odds ratio of the environmental factor, type of GxE and the proportion of affected individuals in the sample (m).
The key issue is that the overall disease frequency in the population m is given by
Dividing Eq. 6 by T R_{1 }and by means of some algebraic manipulation, it is straightforward to show that
In a similar way it is possible to derive the expressions for the marginal risks of the other genotypes. By numerically solving this set of three equations (one for each T R_{i}) it is possible to obtain a_{i }and β_{i }coefficients that match at most the user's requests.
Algorithm and Implementation
The simulation procedure is divided into several steps (Figure 3). First of all, the genotypes of G genetic factors and the levels of exposure of E environmental factors are assigned to the N individuals.
Figure 3. Flowchart of GENS. Starting from desired population characteristics, GENS assigns to each individual the genotypes of genetic factors and the exposure levels of environmental factors. Beside, the module KAPS uses population characteristics to compute coefficients of the MultiLogistic Model. Thus, on the basis of individual characteristics and MultiLogistic Model, the individual disease risk is computed. The last step is the assignment of disease status to individuals (affected/not affected) according to their disease risks.
Consequently, the sample population characteristics (Table 2) are input by the user and hence the coefficients of the MLM are calculated. Finally, the disease risk and the disease status are assigned to each individual.
Table 2. Parameters required by GENS. Description of parameters required by GENS in order to produce a simulated casecontrol sample. These parameters are translated into coefficients for the MultiLogistic Model by the KnowledgeAided Parametrization System.
Concerning the genetic factors, the user can provide the allelic frequencies or allow the simulator to randomly select them (with a uniform distribution between 0.1 and 0.9). In both cases, the HardyWeinberg's law is used for the calculation of the frequencies of the genotypes. Afterwards, by means of a Monte Carlo method, the genotype of each genetic factor is randomly assigned to each individual according to the genotypic frequencies. Similarly for the environmental factors, the user can use a distribution function, among a set of predefined ones, or provide an empirical distribution P^{E}(). Again by a Monte Carlo process the exposures of environmental factors are assigned according to distribution functions.
After the assignment of the genetic and environmental factors to individuals, the next step is the assignment of the phenotype. For this process the system computes the coefficients of the MLM in order to create the relationship between population characteristics, type of GxE interaction, and disease risk. The actual computation of the coefficients is performed by KAPS that, by means of Eq. 7 and similar ones for T R_{2 }and T R_{3}, solves numerically the resulting system of three equations and returns α_{1}, α_{2}, α_{3}, β_{1}, β_{2 }and β_{3}.
The disease risk (0 ≤ R(g_{i}, x_{j}) ≤ 1) is assigned by the MLM (Eq. 3) by using the parameters previously identified. In particular, for each individual his genotype i establishes the coefficients α_{i }and β_{i }computed by KAPS, while the exposure level is the value of the covariate x_{j}. The last step is to assign a disease status (affected/not affected) to the individuals. Again by a Monte Carlo process, the system generates a random number with uniform distribution in [0, 1] and assigns to the individual the status 1 (affected) if this number is less then his risk R(g_{i}, x_{j}), or 0 (not affected) if otherwise.
An implementation of GENS is freely available on Sourceforge https://sourceforge.net/projects/gensim webcite as a set of Matlab 7.0 scripts that can be freely modified to address different requirements (different risk function, multiloci interaction, etc.).
Discussion and Conclusions
In this article we present a novel mathematical approach to model GxE in complex diseases. This approach is based on a MultiLogistic Model (MLM) and it is specifically tailored to model disease risk in data set that simulates casecontrol samples. We implemented this method in GeneEnvironment iNteraction Simulator (GENS), a tool designed to yield casecontrol samples for GxE. These tools could be useful to generate simulated data sets in order to assess the performances of statistical methods.
The necessity to provide simulated populations is due to the difficulty of obtaining real populations in which enough parameters are known to be related to the phenotype. Furthermore, during the design of a statistical study, simulated populations can also be used to estimate the expected statistical power when assuming different types of GxE [26]. We focused on inputing characteristics extracted by real populations (such as allelic frequencies, environmental factor distributions, risk given by genetic and environmental factors, etc.). In this way it is also easy to replicate real populations and evaluate the change of statistical power due to changes of the parameters as the sample size (N) and the disease frequency (m) and the type of GxE and etc.
The key idea underlying the MLM is the modelling of the disease risk in each combination of genetic factors (genotypes) as a different mathematical function of the environmental exposures (Figure 1). In this way it is possible to model any type of interaction between genetic and environmental factors, also complex and nonlinear ones. We based our approach on the logistic function. This function is widely used in epidemiological studies and has several advantages. It follows the WeberFechner law and as the value of the risk factor increases it naturally ranges from 0 to 1 [27]. Moreover, the coefficients of the covariates correspond to the logarithm of the odds ratio due to a oneunit increase (in this case the environmental factor) [27]. In particular, in order to calculate the disease risk, the genetic factors of individuals sets coefficients of the function while the environmental factors assign a value to its covariates.
We implemented the MLM in the GeneEnvironment iNteraction Simulator, a GxE simulator for casecontrol studies. The intended audience of GENS is the biomedical community, thus the main efforts have been to describe populations by standard epidemiological measures, to implement constraints to make the simulator behaviour biologically meaningful, and to define the GxE in biological understandable terms. In theory, the MLM can model interactions of multigenetic and multienvironmental factors. However, for the sake of simplicity we focused on an interaction between one genetic and one environmental factor. In this way it is much easier to use as input standard epidemiological measures. Nevertheless, even in this simple situation, the handling of the interaction is not straightforward. Furthermore, in simulated populations besides the involved factors there are other ones that act as noisy background, as frequently occurs in real data sets.
Even in this simple scenario, modelling the desired characteristics of a population can be very difficult, except for some particular and simple cases, mainly because it is necessary to provide several coefficients to the mathematical model. However, having several coefficients with a difficult interpretation is a common pitfall when modelling complex interactions. Therefore, to overcome this limitation we have implemented the KnowledgeAided Parametrization Subsystem (KAPS). This system exploits a set of reasonable biological constraints to reduce the complexity of the system. First of all, concerning the genetic factors, we imposed that the risk assigned to the heterozygote falls between the two homozygotes. Secondly, we adopted a qualitative description of the GxE. In particular, each type of GxE can be modelled as a set of equality and inequality of α_{i }and β_{i }among genotypes. We predetermined two types of GxE, an additive (AM) and a modulative type (GEM). The user has only to select which type of GxE must be simulated, without providing additional information. In this way, we can reduce the complexity of the system and, therefore, reduce the degrees of freedom of the mathematical model. Finally, KAPS solves the system of equations to derive coefficients in order to comply with both biological constraints and population characteristics imposed by the user. As a consequence, to simulate a population only classical epidemiological parameters have to be provided (Table 2). However, the user can simulate any kind of interaction by the freedom of inputing all the coefficients of the MLM, and even to substitute the logistic expression with a different one.
In population genetics, data simulation has been mainly used to study population evolution, linkage disequilibrium, and pedigree of mendelian disease [1622]. Although some very interesting tools have been specifically designed for complex diseases [17,25], some limitations still exist. For example they do not directly produce casecontrol data sets. GENS is specifically designed to produce casecontrol data sets as close as possible to real ones in a simple manner. In addition, differently from a naive logistic model, the MLM allows modelling nonliner phenomena such as epistasis.
One of the shortcomings of GENS compared to other tools could be the limitation of one geneone environment interactions. However, this choice has been made because it is easier to describe and understand the joint and single role of the factors. It should be noted that this limitation accounts mainly to the present implementation, in particular to KAPS. In fact, the multilogistic model can be easily used to simulate multi geneticmulti environmental factor interactions by applying Eq. 2 and providing enough coefficients. The number of environmental factors are increased by adding additional covariates in the functions to consider their effects. Instead, the number of genetic factors involved in the disease risk is increased by defining additional logistic functions in the multilogistic model. For example, with the software a file is provided containing parameters of a nonlinear interaction among three genetic and two environmental factors. Furthermore, the multilogistic model can be extended to use different functions for each combination of genetic factors.
As our approach is widely based on a Monte Carlo process, the system naturally takes into account the randomness present in any real data sets obeying to probabilistic laws. In other words, data sets created with the same characteristic results to be randomly different.
In conclusion, by the multilogistic model and GENS it is possible to simulate casecontrol samples of complex diseases where geneenvironment interactions influence the disease risk. The user has full control of the main characteristics of the simulated populations and the Monte Carlo process allows random variability. A knowledgebased approach reduces the complexity of the mathematical model by using reasonable biological constraints and makes the simulation more understandable in biological terms. Simulated data sets can be used for the assessment of novel statistical methods or for the evaluation of statistical power when designing a study.
Authors' contributions
RA, MP and GM conceived and developed the model. RA, DDA and GR implemented the scripts. MP and SC curated the biological aspect. GM, MN, GR and SC participated in the design and coordination of the overall study, and drafted the manuscript. All authors read and approved the final manuscript.
Acknowledgements
We thank Michelle Kutzner for her help in the revision of the manuscript. RA is supported by research doctorate in Computational Biology and Bioinformatics, University of Naples "Federico II".
References

Weeks DE, Lathrop GM: Polygenic disease: methods for mapping complex disease traits.
Trends Genet 1995, 11(12):5139. PubMed Abstract  Publisher Full Text

Group TGCR: New models of collaboration in genomewide association studies: the Genetic Association Information Network.
Nat Genet 2007, 39(9):104551. PubMed Abstract  Publisher Full Text

Lohmueller KE, Pearce CL, Pike M, Lander ES, Hirschhorn JN: Metaanalysis of genetic association studies supports a contribution of common variants to suibility to common disease.
Nat Genet 2003, 33(2):17782. PubMed Abstract  Publisher Full Text

Hunter DJ: Geneenvironment interactions in human diseases.
Nat Rev Genet 2005, 6(4):28798. PubMed Abstract  Publisher Full Text

Khoury MJ, Davis R, Gwinn M, Lindegren ML, Yoon P: Do we need genomic research for the prevention of common diseases with environmental causes?
Am J Epidemiol 2005, 161(9):799805. PubMed Abstract  Publisher Full Text

Hoh J, Ott J: Mathematical multilocus approaches to localizing complex human trait genes.
Nat Rev Genet 2003, 4(9):7019. PubMed Abstract  Publisher Full Text

Consortium TWTCC: Genomewide association study of 14,000 cases of seven common diseases and 3,000 shared controls.
Nature 2007, 447(7145):66178. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Murase Y, Yamada Y, Hirashiki A, Ichihara S, Kanda H, Watarai M, Takatsu F, Murohara T, Yokota M: Genetic risk and geneenvironment interaction in coronary artery spasm in Japanese men and women.
Eur Heart J 2004, 25(11):970977. PubMed Abstract  Publisher Full Text

Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF, Moore JH: Multifactordimensionality reduction reveals highorder interactions among estrogenmetabolism genes in sporadic breast cancer.
Am J Hum Genet 2001, 69:13847. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Hahn LW, Ritchie MD, Moore JH: Multifactor dimensionality reduction software for detecting genegene and geneenvironment interactions.
Bioinformatics 2003, 19(3):37682. PubMed Abstract  Publisher Full Text

An Alphabetic List of Genetic Analysis Software [http://www.nslijgenetics.org/soft/] webcite

Gasbarra D, Sillanpää MJ, Arjas E: Backward simulation of ancestors of sampled individuals.
Theor Popul Biol 2005, 67(2):7583. PubMed Abstract  Publisher Full Text

Peng B, Amos CI, Kimmel M: ForwardTime Simulations of Human Populations with Complex Diseases.
PLoS Genetics 2007, 3(3):e47. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

CarvajalRodriguez A: Simulation of genomes: a review.
Curr Genomics 2008, 9(3):155159. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Leal SM, Yan K, MüllerMyhsok B: SimPed: a simulation program to generate haplotype and genotype data for pedigree structures.
Hum Hered 2005, 60(2):11922. PubMed Abstract  Publisher Full Text

Schmidt M, Hauser ER, Martin ER, Schmidt S: Extension of the SIMLA package for generating pedigrees with complex inheritance patterns: environmental covariates, genegene and geneenvironment interaction.

Laird NM, Horvath S, Xu X: Implementing a unified approach to familybased tests of association.
Genet Epidemiol 2000, 19(Suppl 1(NIL)):S3642. PubMed Abstract  Publisher Full Text

Terwilliger JD, Speer M, Ott J: Chromosomebased method for rapid computer simulation in human genetic linkage analysis.
Genet Epidemiol 1993, 10(4):21724. PubMed Abstract  Publisher Full Text

Ott J: Computersimulation methods in human linkage analysis.
Proc Natl Acad Sci USA 1989, 86(11):41758. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Ploughman LM, Boehnke M: Estimating the power of a proposed linkage study for a complex genetic trait.
Am J Hum Genet 1989, 44(4):54351. PubMed Abstract  PubMed Central Full Text

Boehnke M: Estimating the power of a proposed linkage study: a practical computer simulation approach.
Am J Hum Genet 1986, 39(4):51327. PubMed Abstract  PubMed Central Full Text

Li C, Li M: GWAsimulator: a rapid wholegenome simulation program.
Bioinformatics 2008, 24:1402. PubMed Abstract  Publisher Full Text

Obaidat MS, Papadimitriou GI, Eds: Applied system simulation: methodologies and applications. Norwell, MA, USA: Kluwer Academic Publishers; 2003.

Gauderman WJ: Sample size requirements for matched casecontrol studies of geneenvironment interaction.
Stat Med 2002, 21:3550. PubMed Abstract  Publisher Full Text

GarciaClosas M, Lubin JH: Power and sample size calculations in casecontrol studies of geneenvironment interactions: comments on different approaches.
Am J Epidemiol 1999, 149(8):68992. PubMed Abstract  Publisher Full Text

Hosmer DW, Lemeshow S: Applied logistic regression (Wiley Series in probability and statistics). WileyInterscience Publication; 2000.