Abstract
Background
Cockerham genetic models are commonly used in quantitative trait loci (QTL) analysis with a special feature of partitioning genotypic variances into various genetic variance components, while the F_{∞ }genetic models are widely used in genetic association studies. Over years, there have been some confusion about the relationship between these two type of models. A link between the additive, dominance and epistatic effects in an F_{∞ }model and the additive, dominance and epistatic variance components in a Cockerham model has not been well established, especially when there are multiple QTL in presence of epistasis and linkage disequilibrium (LD).
Results
In this paper, we further explore the differences and links between the F_{∞ }and Cockerham models. First, we show that the Cockerham type models are allelic based models with a special modification to correct a confounding problem. Several important moment functions, which are useful for partition of variance components in Cockerham models, are also derived. Next, we discuss properties of the F_{∞ }models in partition of genotypic variances. Its difference from that of the Cockerham models is addressed. Finally, for a twolocus biallelic QTL model with epistasis and LD between the loci, we present detailed formulas for calculation of the genetic variance components in terms of the additive, dominant and epistatic effects in an F_{∞ }model. A new way of linking the Cockerham and F_{∞ }model parameters through their coding variables of genotypes is also proposed, which is especially useful when reduced F_{∞ }models are applied.
Conclusion
The Cockerham type models are allelebased models with a focus on partition of genotypic variances into various genetic variance components, which are contributed by allelic effects and their interactions. By contrast, the F_{∞ }regression models are genotypebased models focusing on modeling and testing of withinlocus genotypic effects and locusbylocus genotypic interactions. When there is no need to distinguish the paternal and maternal allelic effects, these two types of models are transferable. Transformation between an F_{∞ }model's parameters and its corresponding Cockerham model's parameters can be established through a relationship between their coding variables of genotypes. Genetic variance components in terms of the additive, dominance and epistatic genetic effects in an F_{∞ }model can then be calculated by translating formulas derived for the Cockerham models.
Background
Genetic models provide a basis for analyzing genetic properties in study populations. For quantitative traits, one type of models that has long been used in experimental designed populations for analysis of quantitative trait loci (QTL) is the socalled Fisherian or Cockerham model with a focus on partition of genotypic variances into additive, dominance and epistatic genetic variance components, and their model parameters are often called average allelic effects. Another popular model which has been widely used in many genetic association studies is referred to as the F_{∞ }model whose parameters are often defined as the additive, dominance and epistatic effects [14]. Over years, there have been some confusion about the relationship between these two types of models [57]. The relationship between the additive, dominance and epistatic effects and the genetic additive, dominance and epistatic variance components has not been well established, especially when multiple QTL are involved in presence of epistasis and linkage disequilibrium (LD). To shed some light on this issue, in this paper we further explore the differences and links between these two types of models.
In genetic studies, a partition of genotypic variance into additive, dominance and epistatic variance components provides us a basis to better understand the genetic inheritance properties of a quantitative trait from a parental population to their progeny population. For example, the additive variance is the chief cause of resemblance between relatives. The genetic variance components also play a crucial role in studies of heritability, covariance between relatives, and variance components analysis. Fisher [8] proposed a least square regression model for modeling QTL by partitioning genotypic variance into additive, and dominance variance components, where an additive variance describes the variation contributed by an average substitution effect of a specific allele transmitted from a parent to offspring, and where a dominance variance is a portion of the genotypic variance due to interaction of the two alleles from both parents. Cockerham [9,10] extended the Fisher's model to multiple loci with redefined additive, dominance and epistatic effects of QTL based on statistical orthogonal contrasts. Kempthrone [11,12] further generalized the model to multiple alleles. More recently, Mao et al. [13] extended the twolocus biallelic Cockerham model to allow HardyWeinberg and linkage disequilibria, and they introduced 35 allelic effect parameters to describe various allelic effects and their interactions. Zeng et al. [7] introduced a general multilocustwoallele (G2A) model to represent the Cockerham model in a multiple regression model setting, and compared several models for analyzing QTL effects and epistasis. Wang and Zeng [14] further extended the approach to multiple alleles and derived formulas for computing variance components in presence of epistasis and LD.
The F_{∞ }model focuses on direct modeling of genotypic values and testing for genotypic association of QTL with quantitative traits. There have been discussions on classification of various penetrance modes of diseases based on the F_{∞ }model and genotypic values [15]. In terms of its modeling scheme, as we will see later in this paper, the F_{∞ }model is genotypebased by treating genotypes as different levels of the locus factors. ÁlvarezCastro and Carlborg [16] also proposed a unified model to incorporate both genotypic and allelic effects into one framework. Meanwhile, there have been continuous efforts on modeling QTL effects and epistasis based on their biological functions [5,6,17]. More recently, several articles have addressed the issue of F_{∞ }models on partition of genotypic variances [7,18,19]. Under the assumption of linkage equilibrium, Tiwari and Elston [19] considered a twolocus biallelic F_{∞ }model and derived formulas for computing genetic variance components in terms of the additive, dominant and epistatic genetic effects. Yang [18] discussed the impact of zygotic association on partition of genotypic variance in F_{∞ }models. Zeng et al. [7] compared the difference in definition of model parameters between the F_{∞ }and Cockerham models. It was pointed out that these two types of models are different ways of modeling the genotypic values and the two models' parameters are transferable from one to the other through their relationship with the genotypic values under certain circumstances.
In this paper, we further explore the differences and links between the F_{∞ }and Cockerham models in terms of their modeling schemes and on partition of the genotypic variance. First, we clarify that the Cockerham type models are in fact allelebased models with a special modification to correct a collinearity problem. Formulas of several moment functions for a twolocus biallelic Cockerham model are also derived, which are useful for deriving formulas in calculation of the genetic variance components. Next, we explore the difference and properties of these two types of models in partition of genotypic variances. We show that the traditional F_{∞ }models are basically genotypebased models in which the additive and dominance effects could be confounded with each other in partition of genotypic variances. This fact usually does not affect association tests in the standard regression analysis. But it can make the partition of genotypic variances intricate, especially when locusbylocus interactions are involved. We also introduce a meancorrected F_{∞ }model, which can provide a partially orthogonal partition of the genotypic variance between loci under zygotic equilibria, although its withinlocus variances may still not be orthogonal due to the possible confounding between its coding variables of genotypes. We discuss pros and cons of the F_{∞ }and Cockerham models in association analysis and in partition of the genotypic variances.
The second part of this paper concentrates on calculation of the genetic variance components in terms of the additive, dominance and epistatic genetic effects in an F_{∞ }model. Tiwari and Elston [19] derived formulas for computing genetic variance components in terms of the additive, dominant and epistatic genetic effects for a twolocus biallelic F_{∞ }model under the assumption of linkage equilibrium. More recently, Zeng et al. [7] proposed a way of linking the two sets of model parameters through their relationship with the genotypic values when fully parameterized models are applied. Since formulas for partition of the genotypic variance into additive, dominance and epistatic variances have been well established for Cockerham models [9,14], we can then calculate the genetic variance components by translating the partition formulas of the variance components derived from their equivalent Cockerham models. As examples, for a onelocus F_{∞ }model with HardyWeinberg disequilibrium and a twolocus F_{∞ }model with both epistasis and LD, we present detailed formulas for computing various genetic variance components in terms of the additive, dominant and epistatic effects together with allele frequencies and LD measures. We also propose an alternative way of transforming the additive, dominance and epistatic effects in an F_{∞ }model into the average allelic effects in its corresponding Cockerham model through the coding variables of genotypes used in these two models, which is especially useful when reduced F_{∞ }models are applied. Moreover, Some practical issues relating to using of reduced F_{∞ }or Cockerham models are addressed.
Results
Genetic models
In the analysis of quantitative trait, the observed phenotypes can usually be expressed through the following model
where Y is the phenotypic value, G is the genotypic value, E is the environmental deviation, and G × E is the genetic by environmental interaction. Adjustment for environmental deviation and genetic by environmental interaction can usually be achieved by incorporating suitable environmental covariates into the model. Therefore, in the rest of the paper, we omit E and G × E from the model and focus on modeling and analysis of the genotypic values.
Quantitative trait loci (QTL) refer to genes that contribute to variation of a quantitative trait. In a study population, given specific genotypes g at the QTL under consideration, the genotypic value G(g) = E(Gg) is defined as the mean of individuals with genotypes g in the study population. In practice, the genotypic value G of an individual is unknown and needs to be estimated. Let P_{g }be the genotypic distribution of the QTL in the study population, a regression model can be expressed as
where the genotypic value G(g) is fixed given a specific genotype g. Since the QTL usually has a finite number of genotypes, G(g) itself can be treated as a discrete random variable that takes certain quantitative values with its distribution specified by P_{g}. Therefore,
With a large enough random sample from a study population, the genotype data from the sample would follow approximately the same genotypic distribution as P_{G}. The classical analysis of variance (ANOVA) or regression analysis is a typical tool for analysis of V_{G }and test for possible association of genotypes at the QTL with the phenotypic trait. Now, a fundamental question is how to model the genotypic values G(g) given the QTL genotypes.
In human genome, an individual always carries two alleles at a QTL  one from the father and the other from the mother. It is possible that a disease is caused by a mutant allele inherited from one of the parents. To understand such inheritance properties from parents to their offspring, a natural way is to treat paternal and maternal alleles as two different factors and assess their allelic effects. Given that, let us first consider a single QTL case with two alleles A, a at the locus. For each individual, we can define the following indicator variables to describe the transmission of alleles from parents to the individual.
Then we can write down a simple regression model as
where g = (a, a') with a, a' being the paternal and maternal allele, respectively. In practice, however, this model is not very useful because we usually cannot distinguish the paternal and maternal alleles from the observed genotype data; i.e., the socalled phase problem. But suppose that the paternal and maternal alleles have the same effects, which is a reasonable assumption in most of the genetic studies, then the above model can be simplified as
where w'(g), v'(g) are defined as
In this model, based on the genotypic values, we have α' = G_{Aa } G_{aa}, δ' = (G_{AA }+ G_{aa})  2G_{Aa}, and the reference point (or baseline) μ' = G_{aa }is the genotypic value of genotype aa.
Typically, the genetic additive variance V_{A }is defined as a variation contributed by allelic effects alone, and the genetic dominance variance V_{D }is the variation contributed by interaction of the paternal and maternal alleles. Under the assumption of HardyWeinberg equilibrium (HWE), it is well known that the genotypic variance has an orthogonal partition V_{G }= V_{A }+ V_{D }in which the genetic dominance variance V_{D }becomes the deviation of the genetic variance attributable to the locus from the additive variance [4,20]. A first look at model (2) might lead us to believe that under HWE we would have an orthogonal partition of the genotypic variance V_{G }= V_{A }+ V_{D }with V_{A }= V (α'w'(g)) and V_{D }= V (δ'v'(g)). However, this is not true because the interaction term δ'v'(g) in model (2) is correlated with the additive term α'w'(g) due to a positive correlation between z_{M }(or z_{F}) and v' = z_{M }z_{F}. In fact, although the two indicator variables z_{M }and z_{F }are assumed to be independent under HWE, we have covariances Cov(z_{M}, z_{M }z_{F}) = Cov(z_{F}, z_{M }z_{F}) = V (z_{F})E(z_{M}) = p^{2}(1  p), where p = p_{A }is the frequency of allele A. Therefore, the covariance between the two coding variables w' and v' is Cov(w', v') = Cov(z_{M }+ z_{F}, z_{M }z_{F}) = 2p^{2}(1  p), which means w' and v' are almost always positively correlated as long as the frequency of allele A not being zero. Even more general, from the definition of w' and v' above, we can show that Cov(w', v') = 2(1  p)P_{AA}, regardless of whether there is HWE or not. Thus, model (2) provides a partition of the genotypic variance as
with a portion of it contributed by correlation between the effects α' and δ'. This problem, caused by using two correlated explanatory variables w', v' in a multiple regression model, is often referred to as a confounding problem, or statistically, a multicollinearity problem, which tends to make and partition of variance components and the interpretation of the regression coefficients intricate, and in extreme cases leads to large standard errors for the least square estimates. To overcome this multicollinearity problem on partition of genetic variances, one strategy is to make mean corrections on those genotype coding variables [7,14]. If we introduce two meancorrected index variables defined by x_{M }= z_{M } p and x_{F }= z_{F } p, then we can build a modified version of model (2) as in the following
where w(g), v(g) are defined by
It should be pointed out that the index variable v as defined above is slightly different by (2) folds from the one we defined in [14] in order to keep the definition of δ consistent with the G2A model introduced in Zeng et al. [7], of which the standard F_{2 }model is a special case.
Model (3) is actually a regression form of the Cockerham model in one QTL case [7]. Under HWE, the indicator variables z_{M }and z_{F }are independent, as well as the index variables x_{M }and x_{F}. Thus we have now , which leads to our familiar orthogonal partition of the genotypic variance V_{G }= V_{A }+ V_{D }with V_{A }= α^{2}V (w) = 2α^{2}pq and V_{D }= δ^{2}V (v) = 4δ^{2}p^{2}q^{2}, where q = 1  p. Under HardyWeinberg disequilibrium, we can represent genotype frequencies as P_{AA }= p^{2 }+ pqf, P_{Aa }= 2pq  2pqf and P_{aa }= q^{2 }+ pqf, where f is a measure of departure from HWE. Then the genotypic variance V_{G }= V_{A }+ V_{D }+ 2Cov(A, D) with
Back to the previous model (2), it is easy to see that the coding variables w', v' in model (2) and the index variables w, v in model (3) have relationships w' = w + 2p and . Note that w' is still the one that specifies the additive effect except with a constant shift, whereas v' includes a portion of w, which is the reason why model (2) cannot provide orthogonal partition of genotypic variance under HWE. The positive correlation between the two coding variables w' and v' in model (2) can also complicates the interpretation of regression parameters α', δ'. Using the method proposed in the next section, we can show that the parameters in models (2) and (3) have relationships α' = α + 2pδ and δ' = 2δ. Thus, the additive effect α' in model (2) is actually a combination of the average allelic effect α and dominance effect δ in the Cockerham model (3). On model (2) in partition of genotypic variance, we have under HWE
where V (w) = 2pq. Note that V_{A }= α^{2}V (w) = 2pqα ^{2}. So, the positive correlation between the two coding variables w' and v' leads to an increased share of V (δ'v') other than V_{D}, which is partly contributed by a portion of the additive variance. By using the meancorrected index variables w and v, the Cockerham model allows us to separate the confounding effects of the two variables w' and v' at least under HWE in partition of genotypic variance V_{G}. As a result, the dominance variance V_{D }in the Cockerham model (3) is the additional variation contributed by interaction of the paternal and maternal alleles, in addition to the additive variance.
The Cockerham model (3) can easily be extended to multiple loci. For example, consider two loci A and B with alleles A, a and B, b, respectively. We can define indicator variables:
and
for the two loci separately. By further introducing and , where p_{1 }= P_{A}, p_{2 }= P_{B}, and assuming that paternal and maternal gametes (alleles and haplotypes) have the same genetic frequencies and effects, we obtain the following twolocus (G2A) Cockerham model [14]
where
Based on these meancorrected index variables, this Cockerham model allows us to easily incorporate some allelic related properties, such as HWE or linkage equilibrium information, into the variance partition analysis [14]. For instance, since the means of the x's variables are scaled to zero in the population, it is easy to see that all the components in model (4) are independent with each other under HardyWeinberg and linkage equilibria, which leads to orthogonal partition of variance components. In addition, those meancorrected variables x's defined above have some nice properties that can facilitate derivation of formulas for various variance and covariance components. For example, for two loci A and B under HWE but with LD between them, we can show through some derivation that for any integers m, n > 0
where q_{1 }= 1  p_{1}, q_{2 }= 1  p_{2 }and D = P_{AB } p_{1 }p_{2}. These moment functions are quite useful in deriving formulas for partition of the genotypic variance into various allelic based variance components for the above G2A Cockerham model. Besides, under gametic equilibrium, , are independent of , . Hence, for any j, k = 1, 2 and integers m, n > 0. Moreover, and , as we do not distinguish the paternal and maternal gametes.
Note that the above model (4) uses 9 parameters to model G(g), which contains 9 parameters. So this is also a fully parameterized model. In other words, the model parameters E_{G2A·AB }= (μ, α_{1}, δ_{1}, α_{2}, δ_{2}, (αα), (αδ), (δα), (δδ))^{T }simply provide a reparameterization of the 9 genotypic values G_{AB }= (G_{22}, G_{21}, G_{20}, G_{12}, G_{11}, G_{10}, G_{02}, G_{01}, G_{00})^{T}, where G_{ij}, i, j = 0, 1, 2, denote genotypic values with i, j being the counts of A, B alleles in the corresponding genotypes. Using the same notation as in Zeng et al. [7], we have where
As pointed out in [7], the above relationship holds regardless of whether there is a linkage equilibrium or disequilibrium in the study population.
In genetic association studies, we are often interested in examining association of genotypes at certain genetic markers or QTLs with a disease phenotype. In this case, a standard approach is to fit a regression model with genotypes at each locus being treated as different levels of the locus factor. This leads to another popular type of models that have been widely used in genetic association studies; i.e., the socalled F_{∞ }models. Still, let us first consider the simple case of one locus with two alleles A, a. In this case, we have three possible genotypes AA, Aa and aa, and correspondingly three possible genotypic values G_{AA}, G_{Aa }and G_{aa}. The single locus F_{∞ }model is then given by [3,4]
where a, d are often called the additive, dominance effects of alleles A, a, respectively. In terms of the genotypic values, the additive and dominance effects are defined as , , . This model is referred to as an F_{∞ }model simply because the reference point m in the model is the mean of two homozygote genotypic values which corresponds to the mean in an F_{∞ }population [1,2].
The above model can also be written in a regression model form as
where w* (g), v* (g) are two coding functions of genotypes g which are defined as
Since m, a and d in this model simply provides a reparameterization of the original three genotypic values G_{AA}, G_{Aa }and G_{aa}, we can refer a, d as genotypic effects of the QTL with m as a reference baseline.
Statistically, in order to see whether the QTL genotypes is associated with a disease phenotype, we need to test for whether G_{AA }= G_{Aa }= G_{aa }or, equivalently, a null hypothesis of H_{0}: a = d = 0 versus its alternative H_{a}: a or d ≠ 0. The standard regression approach can usually provide unbiased estimates of the model parameters and appropriate test for H_{0 }regardless of possible correlation between w* (g), v* (g), although it may give large standard errors for the least square estimates of parameters when this correlation is very strong.
Now, let us look at the performance of model (5) on partition of genotypic variances. As w* and v* are two coding variables for the three genotypes at the same locus, they are inherently correlated. In fact, let P_{AA}, P_{Aa}, P_{aa }be the genotype frequencies, we can show that Cov(w*, v*) = P_{Aa}(P_{aa } P_{AA}) ≠ 0 as long as P_{aa }≠ P_{AA}. They also have relationships with the index coding variables w', v' in model (2) and the index variables w, v in model (3) as w* = w'  1 = w + 2p  1, v* = w'  2v' = (1  p)w + v + (2p  p^{2}).
Therefore, we have under HWE
In terms of the model parameters, we can show that a = α  (1  2p)δ and d = δ. In summary, we have the following conclusions.
• Model (5) usually provides a different partition of the genotypic variance V_{G }than the one from the Cockerham model (3).
• When P_{aa }= P_{AA}, model (5) can give an orthogonal partition of the genotypic variance V_{G }= V (aw*) + V (dv*), which is different from V_{G }= V_{A }+ V_{D }in the Cockerham model (3) under the assumption of HWE unless .
• The potential correlation between w* and v* often leads to an increased share of V (dv*) other than V_{D}, which is partly contributed by a portion of the additive variance.
• The dominance effect d is the same as the allelic interaction δ in the Cockerham model. As a result, V_{D }= 0 if d = 0.
• The additive effect a = 0 is equivalent to α = (1  2p)δ for the allelic effects in the Cockerham model. So, a = 0 does not necessarily imply V_{A }= 0.
Note also that making meancorrections on the two coding variables w* and v* of genotypes does not help to separate their confounding in this case because dv* in model (5) is not an interaction term.
Extension of the F_{∞ }model (5) to multiple QTL is straightforward. Still consider two loci A and B with alleles A, a and B, b, respectively. We can introduce variables (g), (g), i = 1,2, using the same '1  0  (1)' and '0  1  0' coding for QTL genotypes at each locus. Then a twolocus F_{∞ }model with epistasis included yields
Model (6) is also a fully parameterized model for the 9 genotypic values G_{AB}. As shown in Zeng et al. [7], this twolocus F_{∞ }model can be written in a matrix form as , where = (m, a_{1}, d_{1}, a_{2}, d_{2}, aa, ad, da, dd)^{T}, and
When we fit the above model under a regression model framework, the expected mean of the least square estimates (LSE) of will be given by
where W_{AB }= diag(P_{22}, P_{21}, P_{20}, P_{12}, P_{11}, P_{10}, P_{02}, P_{01}, P_{00}) is of full rank with P_{ij }being the frequency of genotypes corresponding to G_{ij}, i, j = 0, 1, 2. So, the LSE provide unbiased estimates of , regardless of whether there are HardyWeinberg or linkage disequilibria in the genotypic distribution P_{g}. However, as pointed out in Zeng et al. [7], the additive effect a_{1 }can no longer be interpreted as a half of the difference between the homozygote genotypic values G_{2 }= E(GAA) and G_{0 }= E(Gaa) at locus A in the presence of interaction effects, and so does the dominance effect d_{1 }as the difference between the heterozygote genotypic value G_{1 }= E(Gaa) and the mean of the homozygote genotypic values G_{2}, G_{0}. In addition, its partition of genotypic variance V_{G }is complex because not only the withinlocus terms a_{j } and d_{j } are correlated for j = 1, 2, but the withinlocus terms {a_{j }, d_{j }} and the locusbylocus interactions could also be correlated. As a result, even when the genotypes at loci A and B are independent (i.e., the socalled zygotic equilibrium between loci A and B [18]), the variance component V (a_{j } + d_{j }), j = 1, 2, cannot simply be interpreted as a variation contributed by locus j in the presence of interactions.
If we consider using the meancorrected variables ξ_{j }=  E () and η_{j }=  E () to replace and for j = 1,2 in the F_{∞ }model (6), this leads to the following model,
where
As in the one locus case, the meancorrected variables ξ_{j }and η_{j }are very likely correlated within each locus j = 1, 2. But it could help to reduce the complexity of variance partition in certain circumstances. For example, under zygotic equilibrium between loci A and B, {ξ_{1}, η_{1}} are independent of {ξ_{2}, η_{2}}, and {ξ_{j}, η_{j}, j = 1, 2} are uncorrelated with interactions {ξ_{1 }ξ_{2}, ξ_{1 }η_{2}, η_{1 }ξ_{2}, η_{1}η_{2}} as well. As a result, the within locus effects (), j = 1,2, and the locusbylocus interactions (aa' ξ_{1 }ξ_{2 }+ ad' ξ_{1 }η_{2 }+ da' η_{1 }ξ_{2 }+ dd' η_{1 }η_{2}) as a whole are orthogonal to each other, although the interaction terms {aa' ξ_{1 }ξ_{2}, ad' ξ_{1 }η_{2}, da' η_{1 }ξ_{2}, dd' η_{1}η_{2}} among themselves may still be correlated. Thus,
In general, for more than two loci under zygotic equilibria, we will have
In this case, V (a_{j } + d_{j }) is the variation contributed by genotypes locus j, while V () represents the variation contributed by genotypic interactions between loci j and k. We will refer to model (7) as a meancorrected F_{∞ }model. It is interesting to see that, in an F_{2 }population, this meancorrected F_{∞ }model is reduced to the classical F_{2 }model as its special case. The same situation happens for the Cockerham model (4) as well.
We can also model multiple QTL by extending model (2) to multiple loci. For example, an allelebased twolocus biallelic model is given by
where , are coding variables defined in the same way as the ones in model (2) for the two loci separately. It is a model similar to the F_{∞ }model (6) except that the coding variables of genotypes are defined in different ways. From the definition of these coding variables, it is also easy to see that and . We can show that the parameters in models (8) and (6) have the following relationship
Without locusbylocus allelic interactions, we have a_{j }= and for j = 1, 2. In the presence of locusbylocus allelic interactions, a_{j }= d_{j }= 0 is not equivalent to . As alleles represents the more basic levels of genetic factors than genotypes, the allelebased models are inherently more general and can be utilized to examine specific allelic effects and their interactions. When phase information is available, we could also use separate indicator variables of alleles to specify the paternal and maternal origins of alleles, which could be very useful in situations where the paternal or maternal genes may have different allelic effects and their interactions are of interest (e.g., genetic imprinting). On the other hand, the coefficients in a F_{∞ }model are more closely associated with homozygosity and heterozygosity at the loci [2].
In regard to the modeling schemes, we can see that a major difference between the F_{∞ }and Cockerham models lies in whether we treat genotypes or alleles as levels of the locus factors. The traditional F_{∞ }models treat genotypes as levels of the locus factors with genotypic effects at each locus and locusbylocus genotypic interactions being of major interest. The Cockerham models are defined by treating alleles as levels of the locus factors with a focus on partition of genotypic variances into various genetic variance components, and by using a meancorrection on coding variables of alleles it can effectively reduce the confounding between allelic effects and their interactions in partition of the genotypic variance. Both types of models can actually have two different versions  one is defined directly on coding of genotypes (or allele types), and the other on using meancorrected index variables to reduce confounding between the main effects and their interactions. The former ones, either genotypebased or allelebased, have their coding variables defined on genotypes or alleles directly regardless of the genotypic or allelic distributions. The latter ones are based on some meancorrected index variables, which depend not only on the genotypes or allele types but also on frequencies of these genotypes or alleles. To distinguish model parameters in these different models and meanwhile stay consistent with current terminology, in the rest of this paper we will simply refer to the additive, dominance and epistatic effects in a traditional F_{∞ }model as the genotypic effects; the parameters in a meancorrected F_{∞ }model as the average genotypic effects with their corresponding variance components as genotypic variance components; the parameters in an allelebased model (e.g., model (2) or (8)) which is defined based on the coding variables of allele types as the allelic effects; and parameters in the traditional (meancorrected) Cockerham model as the average allelic effects with their corresponding variance components as allelic variance components.
Models directly using coding variables of genotypes or allele types are appealing in practice due to their simplicity. However, statistical tests of the genotypic or allelic effects based on pvalues are highly dependent on the regression model, the distribution assumptions and the available sample size. A statistically significant genetic effect with a small pvalue does not necessarily imply a clinically important finding. Besides, there could be inconsistency in definition of model parameters based on a onelocus model or a twolocus model with epistasis [7]. That is, when a multilocus model is applied with epistasis involved, the interpretation of the additive and dominance effects based on one QTL model may change. On the other hand, using models with the meancorrected index variables can allow us to assess how much variations are actually contributed by certain genetic effects or interactions, which could provide consequential information for achieving the clinical importance. A drawback in using these meancorrected models is that they bring genotype or allele frequencies into the design matrix for regression, which will contribute another source of variation in fitting the model as the genotype or allele frequencies need to be estimated in practice. This fact could make it difficult to evaluate variance in estimates of the variance components.
The traditional (meancorrected) Cockerham model can provide orthogonal partition of genotypic variance into additive, dominance and epistatic variance components under HWE and linkage equilibrium, while under zygotic equilibrium the meancorrected F_{∞ }model can give orthogonal partition of genotypic variances between different loci and locusbylocus interactions. Which of the two meancorrected models can provide simpler structure in partition of the genotypic variance really depends on the equilibrium situation in our sample. It is easy to see that a linkage equilibrium between alleles at two QTL under HWE can guarantee zygotic equilibrium of genotypes at the two loci but not the vice versa. Thus, for multiple QTL under both linkage and HardyWeinberg equilibria, the Cockerham model is preferred. When there is zygotic equilibrium of genotypes between two loci but no linkage equilibrium, a meancorrected F_{∞ }model might be preferred. In general, no one model is always preferable to the other in partition of genotypic variances. However, as HWE is expected to (or approximately) held in most of the human genomic regions, QTL with zygotic equilibrium but no linkage equilibrium are possible but rare. In addition, the allelic variance components are important quantities in assessing covariance between relatives and more closely related to the inheritance properties of quantitative traits. As a result, the allelic variance components based on the Cockerham model would expected to be of the main research interest in most of the cases for the genetic variance components analysis.
Genotypic effects and allelic variance components
In Zeng et al. [7], it was pointed out that the additive, dominance and epistatic effects in an F_{∞ }model and the average allelic effects in a Cockerham model are simply two different ways of reparameterization for the genotypic values. They are transferable from each other through their relationship with the genotypic values when fully parameterized models are applied. Since partition of genetic variance components based on Cockerham models has been well established [14,21,22], a relationship between the genotypic effects in an F_{∞ }model and the average allelic effects in its corresponding Cockerham model would allow us to compute various allelic variance components in terms of genotypic effects by translating those formulas on partition of genotypic variance derived from the Cockerham models based on the average allelic effects. In this section, we present detailed formulas for computing the allelic variance components in terms of the genotypic effects for the onelocus F_{∞ }model (5) under HardyWeinberg disequilibrium and the twolocus F_{∞ }model (6) with both epistasis and LD between the two loci. We also propose an alternative way of linking these two sets of parameters through the relationship between the coding variables of genotypes used in F_{∞ }models and the meancorrected index variables used in the Cockerham models. Some practical issues relating to using of reduced models instead of the fully parameterized models are also addressed.
Let us start from the simple case of the single locus F_{∞ }model (5) and its equivalent Cockerham model (3). As pointed out in [7], we can build the relationship between the two sets of model parameters through the genotypic values. Since both models give a full parameterization of the three genotypic values G_{AA}, G_{Aa }and G_{aa}, based on the coding functions for the three genotypes, we have
With some simply algebra, we can show that the genotypic effects and the average allelic effects have the following relationship
where α is the same substitution effect of replacing allele a by A as presented in [4] (p.114). Replacing α, δ in the formula (4) by a, d, we obtain the following partition of V_{G }in terms of a, d in model (5)
Under HWE, we have f = 0. Then V_{A }= 2pq [a + d(q  p)]^{2 }and VD = 2(pqd)^{2}. This is the same results that were presented in [4,20].
Similarly, for a twoQTL model (6), its genotypic effects = (m, a_{1}, d_{1}, a_{2}, d_{2}, aa, ad, da, dd) and the average allelic effects E_{G2A·AB }in its equivalent Cockerham model have the relationship , which yields
Assuming HWE at loci A and B but allowing LD between the two loci, by applying the properties of moment functions we derived before, it can be shown that the variance and covariance components in terms of average allelic effects in the twolocus Cockerham model (4) are given below
where A_{1 }= α_{1 }w_{1}, D_{1 }= δ_{1 }v_{1}, A_{2 }= α_{2 }w_{2}, D_{2 }= δ_{1 }v_{2}, A_{1 }A_{2 }= (αα)w_{1 }w_{2}, A_{1 }D_{2 }= (αδ)w_{1 }v_{2}, D_{2 }A_{1 }= (δα)v_{1 }w_{2 }and D_{1 }D_{2 }= (δδ)v_{1 }v_{2}. Note that the covariance components are caused by correlation between various allelic effects and interactions, while the interactions contribute their own variances regardless of whether the alleles are in HWE and LD or not. The above results are similar to what we presented in [14] for a general G2A model except that a more detailed partition of variance components and their covariance structures are shown here. Note also that the scales for defining the index variables v_{1}, v_{2 }here are slightly different by (2) folds from the ones used in [14] to keep consistent with the ones used in Zeng et al. [7]. Correspondingly, those coefficients related to v's in model (4) differ from the ones in [14] by (2) or 4 folds depending on how many v's are involved. Replacing the allelic effects in the above formulas by genotypic effects using their relationship (9), we can then obtain formulas of the variance and covariance components in terms of the genotypic effects for partition of the genotypic variance. When there is linkage equilibrium between loci A and B, then D = 0 and we have exactly the same result as presented in Tiwari and Elston [19].
In genetic applications, using fully parameterized models may not always be practical due to limited sample sizes, multiple QTL, or a large number of alleles or genotypes showing up at certain QTL. Including all possible genotypic or allelic interactions could make the genetic model over parameterized and hard to fit with too many parameters involved. Collapsing certain number of alleles or genotypes may simplify the model structure but dosing so could meanwhile increase the risk of losing detection of certain informative signals, as effects of true functional alleles can be attenuated by other nonfunctional alleles. By contrast, a simplified genetic model could be used to include only lowerorder terms such as additive, dominance and additive by additive interactions.
Consider a simplified model from the previous twolocus F_{∞ }model with only additive effects at the two loci and the additive by additive interaction being involved. Then, the reduced model is given by
In this case, we have
A reduced model can be thought of as adding constraints on the genotypic values. From , we now have
and δ_{1 }= δ_{2 }= (α_{1 }δ_{2}) = (δ_{1 }α_{2}) = (δ_{1 }δ_{2}) = 0. Thus, when there is HWE at loci A, B and linkage equilibrium between loci A and B, the partition of genotypic variance is given by , with
and .
If there is HWE at loci A, B but LD between the two loci, we will still have the same , and . Besides,
So far, we have relied on the equation to establish the relationship between the average allelic effects E_{G2A·AB }and the genotypic effects . Alternatively, we can establish the relationship between E_{G2A·AB }and through the coding variables used in the F_{∞ }models and the index variables used in the Cockerham models. It is easy to see that the index variables , in the F_{∞ }model (6) and w_{1}, w_{2 }in the Cockerham model (4) have the following relationship
for i = 1, 2. So, replacing w*, v* in model (10) by w, v gives
which leads to the the same results as we showed before. If there are dominance effects involved in the reduced model, then
It is easy to show that the relationship between the allelic effects β and the genotypic effects b is given by
Therefore, with the relationships (11), we can easily transform a F_{∞ }model to its equivalent Cockerham model, or vise versa.
It must be pointed out that the above relationship between the genotypic effects and the average allelic effects hold only when the reduced F_{∞ }models specify the genotypic values correctly. In practice, the true genotypic values are unknown and a reduced model can only provide an approximation of the true genotypic values. In this case, the least square estimates from fitting a reduced model simply gives an unbiased estimator of the partial regression coefficients with expected mean
where W_{AB }= diag(P_{22}, P_{21}, P_{20}, P_{12}, P_{11}, P_{10}, P_{02}, P_{01}, P_{00}) is the same as defined before, ()^{g }denotes a generalized inverse of the matrix (). In this case, the true parameters may depend on not only the genotypic values but also the genotypic frequencies P_{g }with possible allelic association such as LD involved  a fundamental difference between the statistical models and functional models as claimed in [17]. Furthermore, from the relationship , we can see that in general only certain linear combinations of E_{G2A·AB }can be estimated from because may no longer be a nonsingular square matrix. Thus, in this situation, some allelic variance components may not be directly estimable in terms of the genotypic effects in a reduced F_{∞ }model. Alternatively, we can start from a reduced Cockerham model and derive its corresponding reduced F_{∞ }model through using the relationship (11) when some allelic variance components can be reasonably ignored.
Discussion
Nowadays, the F_{∞ }models have been widely used in genetic association studies to test for genotypic association and their interactions with quantitative traits. Most of current association studies, however, focus on reporting pvalues from statistical association tests on the additive, dominance and epistatic effects of QTL. As we have pointed out, an assessment of genetic variations contributed by these genetic effects to the overall genotypic variance is another important piece of information which could be consequential for achieving the clinical significance. Unfortunately, the calculation of either the genotypic or allelic variance components for F_{∞ }models is not trivial, especially when QTL interactions and LD are involved.
In this paper, we first explored the modeling schemes for the F_{∞ }and Cockerham models. We showed that the F_{∞ }models are basically genotypebased models by treating genotypes as different factor levels, while the Cockerham models are allelebased models with a special modification to correct a collinearity problem. These two models usually provide different partitions of genotypic variances. Due to an inherent correlation between the additive and dominance effects within a locus in F_{∞ }models, variances contributed by the withinlocus additive and dominance effects are quite often confounded with each other. Therefore, separate assessment of variations contributed by the additive and dominance effects within a locus is not very meaningful. In order to fully capture the genotypic contribution at a locus, variations contributed by both additive and dominance effects should be assessed jointly.
In this paper, we also pointed that either the Cockerham or the F_{∞ }model can have two different versions  one is defined based on some coding variables for allele types or QTL genotypes, and the other uses some meancorrected index variables. Using those meancorrected index variables can help to reduce the complexity in partition of genotypic variances under either linkage or zygotic equilibria. For example, the traditional (meancorrected) Cockerham model can provide orthogonal partition of genotypic variance into additive, dominance and epistatic variance components under HWE and linkage equilibrium, while under zygotic equilibrium a meancorrected F_{∞ }model can provide orthogonal partition of genotypic variances between different loci and locusbylocus interactions. By introducing the meancorrected index variables, we can easily fit a Cockerham model or a meancorrected F_{∞ }model and compute various allelic or genotypic variance and covariance components using the standard regression approach. It was also noticed that the classical F_{2 }model used in experimental designed populations is actually a special case of both the traditional (meancorrected) Cockerham model and the meancorrected F_{∞ }model.
Using the meancorrection to dissect the confounding of main effects and their interactions on partition of variances is a useful strategy that can also be applied to evaluate gene by environmental interactions. Back to the original model (1), similar to the allelic effects and their interactions in model (2), the genetic main effect G and the gene by environmental interactions G × E could be correlated as well. This correlation usually does not affect the association test of the gene by environmental interactions. But it can lead to a covariance between the genetic main effect G and the gene by environmental interactions G × E on partition of the phenotypic variances even when the main effects of G and E are uncorrelated, which complicates evaluation of the variation contributed by G × E. If we make meancorrections on both G and E, then we can obtain an orthogonal partition of the phenotypic variance V_{Y }contributed by G, E and their interactions G × E as long as G and E are uncorrelated. Without making these meancorrections, the variance V (G × E) itself could be an incorrect estimate of the actual variation contributed by G × E in addition to the genetic and environmental variances V (G) and V (E).
As allelebased models, we can easily incorporate some allele related properties such as HWE or linkage equilibrium into the variance components analysis for the Cockerham models. In this paper, we further explored some useful properties of the index variables and derived formulas of several important moment functions for a G2A model under LD. Similar results can be derived for more than two loci. With three loci A, B and C, for example, we can show that
for any integers n_{1}, n_{2}, n_{3 }> 0. For more than three loci with linkage disequilibria, the moment functions will become more complex. But it is still computationally feasible as long as we have information about the haplotype distribution in the sampled population.
When there is no need to distinguish the paternal and maternal gametes, the F_{∞ }and Cockerham models are transferable. There are two different ways of linking the genotypic effect parameters in a F_{∞ }model with the allelic effect parameters in its corresponding Cockerham model  through either their relationship with the genotypic values, or the relationship between the coding variables of genotypes used in the two types of models. By establishing the relationship between the genotypic effects and allelic effects, we can then calculate the allelic variance components for a F_{∞ }model using the partition formulas derived for its Cockerham model. Using this approach, for a onelocus F_{∞ }model under HardyWeinberg disequilibrium and a twolocus G2A model with epistasis and LD, we presented detailed formulas for partition of genetic variances in terms of their genotypic effects. Moreover, some practical issues related to using of reduced instead of fully parameterized F_{∞ }models were also addressed.
Both the F_{∞ }and the Cockerham models are statistical models, as their model parameters depend on not only genotypic values but also the genotypic distribution in the sampled population especially when reduced forms of the models are used. On the other hand, several attempts have been made to model QTL effects and epistasis based on their biological functions. Cheverud and Routman [5] and Cheverud [6] introduced an unweighted F_{∞ }model and defined several specific epistases termed as "physiological epistases". Hansen and Wagner [17] further inspected genotypebased interactions and termed them as "functional epistases". In our opinion, these two kinds of models are different ways of modeling genetic effects and their interactions, and they serve for different research interests. The functionbased models could be very useful in analysis of the molecular functions of genes and their pathways. In genetic mapping studies, however, they are hindered by lack of appropriate model building tools and the fact that the genotypic values are statistically defined as expected means over the genotypic distribution in the sampled population. With a great body of available regression tools, the statistical models provide a powerful tool for detecting at least relatively common genes with certain magnitude of genetic effects in accommodation with the limited sample sizes.
Conclusion
The Cockerham type models are allelebased models whereas F_{∞ }regression models are genotypebased models. When allelic effects and their interactions are of main research interests, the Cockerham type models are recommended. As genotypebased models, the F_{∞ }models are most suitable for examining genotypic effects and their interactions. Since the allelic variance components are important quantities in assessing covariance between relatives, the calculation and statistical tests of the allelic variance components would be helpful for assessing how much variations are actually contributed by the allelic effects at each locus, and the locusbylocus allelic interactions, which could become a crucial piece of information for assessing the clinical importance. For a Cockerham model with the meancorrected index variables, the allelic variance components can be estimated directly using the standard regression approach. For an F_{∞ }model, when there is no need to distinguish the paternal and maternal allelic effects, we can transform it into its corresponding Cockerham model through the relationship between their coding variables of genotypes. Allelic genetic variance components for the F_{∞ }model can then be calculated by either fitting its equivalent Cockerham model or translating formulas derived from the Cockerham model in terms of the additive, dominance and epistatic genetic effects in the F_{∞ }model. Both the F_{∞ }and Cockerham models provide basis for the QTL analysis. We believe that a better understanding of the differences and links between these two types of models will be helpful for genetic association mapping studies, variance components analysis and dissection of the genetic architecture of quantitative traits.
Authors' contributions
TW participated in the design of the study, conducted the derivation and drafted the manuscript. ZBZ participated in the design of the study and edited the manuscript. All authors read and approved the final manuscript.
Acknowledgements
The authors would like to thank two anonymous reviewers for their constructive comments, and Dr. Lizzy De Lobel in Department of Applied Mathematics & Computer Science at Ghent University, Belgium, for his questions that motivate this study.
References

Hayman BI, Mather KM: The description of genetic interactions in continuous variation.
Biometrics 1955, 11:6982. Publisher Full Text

Veen JH: Tests of nonallelic interaction and linkage for quantitative characters in generations derived from two diploid pure lines.

Mather K, Jinks JL: Biometrical genetics. Landon: Chapman & Hall; 1982.

Falconer DS, Mackay TFC: Introduction to Quantitative Genetics. Harlow, UK: Longman; 1996.

Cheverud JM, Routman EJ: Epistasis and its contribution to genetic variance components.
Genetics 1995, 139:14551461. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Cheverud JM: Detecting epistasis among quantitative trait loci. In Epistasis and the Evolutionary Process. Edited by Wolf JBWM, Brodie EDI. Oxford Univ. Press, UK; 2000:5881.

Zeng ZB, Wang T, Zou W: Modeling quantitative trait loci and interpretation of models.
Genetics 2005, 169:17111725. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Fisher RA: The correlation between relatives on the supposition of Mendelian inheritance.

Cockerham CC: An extension of the concept of partitioning hereditary variance for analysis of covariances among relatives when epistasis is present.
Genetics 1954, 39:859882. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Cockerham CC: Effects of linkage on the covariances between relatives.
Genetics 1956, 41:138141. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Kempthorne O: The correlation between relatives in a random mating population.
Proc Royal Soc B 1954, 143:103113. Publisher Full Text

Kempthorne O: An Introduction to Genetic Statistics. New York: John Wiley & Sons; 1957.

Mao YC, London NR, Ma L, Dvorkin D, Da Y: Detection of SNP epistasis effects of quantitative traits using an extended Kempthorne model.
Physiol Genomics 2006, 28:4652. PubMed Abstract  Publisher Full Text

Wang T, Zeng ZB: Models and partition of variance for quantitative trait loci with epistasis and linkage disequilibrium.

Li W, Reich J: A complete enumeration and classification of twolocus disease models.
Human Heredity 2000, 50:334349. PubMed Abstract  Publisher Full Text

ÁlvarezCastro JM, Carlborg Ö: A unified model for functional and statistical epistasis and its application in quantitative trait loci analysis.
Genetics 2007, 176:11511167. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Hansen TF, Wagner GP: Modeling genetic architecture: a multilinear theory of gene interaction.
Theoretical Population Biology 2001, 59:6186. PubMed Abstract  Publisher Full Text

Yang RC: Epistasis of Quantitative Trait Loci Under Different Gene Action Models.
Genetics 2004, 167:14931505. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Tiwari HK, Elston RC: Deriving components of genetic variance for multilocus models.
Genetic Epidemiology 1997, 14:11311136. PubMed Abstract  Publisher Full Text

Lynch M, Walsh B: Genetics and Analysis of Quantitative Traits. Sunderland, MA: Sinauer Associates; 1998.

Weir BS, Cockerham CC: Twolocus theory in quantitative genetics. In Proceedings of the international conference on quantitative genetics. Edited by E Pollack OK, Bailey TB. Iowa State Univ. Press, Ames; 1977:247269.

Weir BS, Cockerham CC: Complete characterization of linkage disequilibrium at two loci. In Mathematical evolutionary theory. Edited by Feldman MW. Princeton, NJ: Princeton University Press; 1989:86110.