Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Methodology article

A quantitative genetic and epigenetic model of complex traits

Zhong Wang12, Zuoheng Wang3, Jianxin Wang4, Yihan Sui4, Jian Zhang1, Duanping Liao2 and Rongling Wu2*

Author Affiliations

1 Siyang Science and Technology Station, Yuanpeng Institute of Genome, Nantong, Jiangsu, 226019, China

2 Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA

3 Division of Biostatistics, Yale University, New Haven, CT, 06510, USA

4 Center for Computational Biology, Beijing Forestry University, Beijing, 100083, China

For all author emails, please log on.

BMC Bioinformatics 2012, 13:274  doi:10.1186/1471-2105-13-274

The electronic version of this article is the complete one and can be found online at:

Received:24 May 2012
Accepted:1 October 2012
Published:26 October 2012

© 2012 Wang et al.; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.



Despite our increasing recognition of the mechanisms that specify and propagate epigenetic states of gene expression, the pattern of how epigenetic modifications contribute to the overall genetic variation of a phenotypic trait remains largely elusive.


We construct a quantitative model to explore the effect of epigenetic modifications that occur at specific rates on the genome. This model, derived from, but beyond, the traditional quantitative genetic theory that is founded on Mendel’s laws, allows questions concerning the prevalence and importance of epigenetic variation to be incorporated and addressed.


It provides a new avenue for bringing chromatin inheritance into the realm of complex traits, facilitating our understanding of the means by which phenotypic variation is generated.


Systematic or stochastic changes in chromatin states, such as DNA methylation, chromatin remodeling, histone modification and RNA interference, have been thought to provide an additional driving force for phenotypic variation in complex traits and diseases [1-9]. Different chromatin states, called epialleles, that occur in the same sequence allele cannot be captured by an analysis based on DNA sequence alone [10]. With the increasing availability of epigenome technologies, there has been an unprecedented opportunity to understand the role of epiallelic variants in maintaining and inducing functional variation for organisms to better buffer against environmental perturbations. This hence entails the development of quantitative models that can enable our knowledge about the amount and pattern of quantitative variation determined by epialleles. By integrating with linkage or association mapping strategies, these models can retrieve epigenetic variation that cannot be estimated presently [10-13].

There have been several publications on methodological development for epigenetic detection [14-17]. Johannes and Colome-Tatche [16] proposed an experimental approach for estimating epigenetic variation in experimental crosses derived from epigenomically perturbed isogenic lines. This approach is powered to model the effects of epiallelic instability, recombination, parent-of-origin effects, and transgressive segregation on phenotypic variation across generations. Tal et al. [15] derived an expression form for covariances between relatives due to epigenetic transmissibility. A statistical model based on multiple testing procedures has been developed to identify the genomic regions of epigenetic variability among different individuals from genome-wide DNA methylation data [18]. These model developments, in a combination with empirical studies, can be used to test the hypothesis that epigenetic variation arising from chromatin modifications of DNA directly or indirectly is an important contributor to the missing heritability [17,19].

Despite these advances, we are still unclear how much of the phenotypic variation is contributed by epigenetic modifications and, more importantly, through which way epialleles trigger their effects on phenotypic values. The motivation of this article is to develop a quantitative model for estimating and testing the contribution of epigenetic variants to quantitative trait variation. The model allows the prediction of how much genetic variation is produced through a change in the rate of occurrence of epigenetic mutation and the effect of epigenetic factors in a natural population. We particularly discuss how the epigenetic effect interacts with other genetic effects, such as additive and dominant, to affect phenotypic traits. By implementing it into genome-wide association studies [19], the model proposed provides useful guidance for designing efficient and effective molecular experiments to characterize a comprehensive picture of the epigenetic variation of complex traits or diseases in different organisms.


Occurrence rate of methylation

Consider an epigenetic study population of n individuals that are randomly drawn from a natural population, in which a nucleotide site, with two alleles A1 and A2, is thought to affect a phenotypic trait. Let p and q (p + q = 1) denote the allele frequencies of A1 and A2 in the natural population at Hardy-Weinberg equilibrium (HWE), respectively. The genotypic frequencies of A1A1, A1A2, and A2A2 at the nucleotide site studied are expressed as p2, 2pq, and q2, respectively [20,21].

At the nucleotide site studied, some cytosines within a CpG dinucleotide are methylated by adding a methyl group to the 5 position of the cytosine pyrimidine ring. With no loss of generality, allele A1 is a cytosine which is, if any, methylated into a new “allele” called the epiallele, denoted as Ae, at a rate u. After DNA methylation, the population frequencies of non-methylated A1 allele, epiallele Ae and allele A2 are (1 – u)p, up, and q, respectively. Current technologies allow the distinction of epialleles from non-methylated alleles. The process of methylation and the resulting frequencies of six distinguishable genetic and epigenetic types are expressed as

Genotype / epigenotype A 1 A 1 { A 1 A 1 No methylation A 1 A e One methylation A e A e Two methylations A 1 A 2 { A 1 A 2 No methylation A 2 A e One methylation A 2 A 2 Þ A 2 A 2 No methylation Frequency { 1 u 2 p 2 + D 12 + D 1 e 2 u ( 1 u ) p 2 2 D 1 e u 2 p 2 + D 1 e + D 2 e { 2 ( 1 u ) p q 2 D 12 2 u p q 2 D 2 e q 2 + D 12 + D 2 e Observation { n 11 n 1 e n ee { n 12 n 2 e n 22 (1)

where D12, D1e, and D2e are the coefficients of Hardy-Weinberg disequilibrium (HWD) due to a non-random association between alleles A1 and A2, between allele A1 and epiallele Ae, and between allele A2 and epiallele Ae, respectively. It is possible that the previous equilibrium of the population is violated by DNA methylation, leading to the HWD quantified by D12, D1e, and D2e. Thus, the genotype and epigenotype frequencies may be determined by allele and epiallele frequencies and HWD coefficients.

Let n11, n1e, nee, n12, n2e, and n22 (n11+n1e+nee+n12+n2e+n22 = n) denote the observations of the corresponding genotypes/epigenotypes (1) in the study population. Based on the frequencies of these genotypes/epigenotypes, we formulate a polynomial likelihood from which to obtain the maximum likelihood estimates (MLEs) of the allele frequencies, the occurrence frequency of methylation, and HWD using

p ^ = n 11 + n 1 e + n ee + 1 2 n 12 + n 2 e n (2)

u ^ = n ee + 1 2 n 1 e + n 2 e n 11 + n 1 e + n ee + 1 2 n 12 + n 2 e (3)

q ^ = n 22 + 1 2 n 12 + n 2 e n (4)

D ^ 1 e = u ^ 1 u ^ p ^ 2 n 1 e 2 n (5)

D ^ 2 e = u ^ p ^ q ^ n 2 e 2 n (6)

D ^ 12 = 1 u ^ p ^ q ^ n 12 2 n (7)

We are interested in investigating whether there is significant occurrence of DNA methylation at the nucleotide site. This can be tested by formulating a null hypothesis, H0: u = 0, vs. an alternative hypothesis, H0: u ≠ 0, under each of which the likelihoods (L0 and L1) are calculated, respectively. However, because the u value in the H0 lies on the boundary of parameter space, the log-likelihood ratio calculated,

LR = 2 log L 0 log L 1 ,

may not follow a standard chi-square distribution. Self and Liang [22] showed that the null distribution of the LR test statistic is a mixture of projections of chi-square variables onto surfaces, with the weights of mixtures that can be derived analytically only in special cases. By establishing the asymptotic null and alternative distributions of quasi-likelihood ratio, rescaled quasi-likelihood ratio, Wald, and score tests, Andrews [23] suggested the use of these test statistics to test the boundary value of a model parameter. While the first three test statistics are easy to compute, the score test is more difficult by deriving the first and second-order derivatives of the alternative log-likelihood.

Similar tests can be performed for individual HWD, D1e, D2e, or D12, or their combinations, by formulating the null hypotheses, respectively. Under the alternative hypothesis H1 associated with each null hypothesis considered, the likelihood is calculated. The LR value calculated is thought to be asymptotically chi-square distributed with the degree of freedom equal to the difference in the number of parameters to be estimated between the alternative and null hypotheses.

Genetic and epigenetic effect

We assume that the study population is investigated under a uniform condition so that the phenotypic variation can be simply partitioned into genetic/epigenetic components and errors. There are only three genotypes, A1A1, A1A2, and A2A2, prior to DNA methylation. Let a denote the additive effect of the nucleotide site due to the substitution of allele A1 by A2 or vice versa and d denote the dominant effect due to the interaction between the two alleles. The values of three genotypes are diagrammed over an axis as follows:

Genotype A 2 A 2 A 1 A 2 A 1 A 1 Genotypic value μ a μ μ + d μ + a Net genotypic value a 0 d a Origin (8)

As described above, allele A1 is assumed to be methylated into the epiallele Ae. The values of six distinguishable genetic and epigenetic types are expressed as

Genotype / epigenotype A 1 A 1 { A 1 A 1 No methylation A 1 A e One methylation A e A e Two methylations A 1 A 2 { A 1 A 2 No methylation A 2 A e One methylation A 2 A 2 Þ A 2 A 2 No methylation Expected Value { μ + a 1 μ + 1 2 ( a 1 + a e ) + d 1 e μ + a e { μ 1 2 a e + d 12 μ 1 2 a 1 + d 2 e μ a 1 a e Estimated Value { i = 1 n 11 y i / n 11 i = 1 n 1 e y i / n 1 e i = 1 n ee y i / n ee { i = 1 n 12 y i / n 12 i = 1 n 2 e y i / n 2 e i = 1 n 22 y i / n 22 (9)

where the genotypic value of the trait is decomposed into different components, i.e., the overall mean (μ), the additive effects due to the substitution of allele A1 (a1) and epiallele Ae by allele A2 (ae), and the dominance effects due to the interaction between allele A1 and epiallele Ae (d1e), between allele A1 and allele A2 (d12) and between allele A2 and epiallele Ae (d2e).

Let yi denote the phenotypic value of the trait for individual i (i =1, …, n) in the study population. The MLEs of the genotypic value for each genotype/epigenotype can be obtained by simply taking its mean over all individuals belonging to this genotype/epigenotye (9). The genetic and epigenetic effects can be estimated by solving a group of regular equations for the genotypic values (9), i.e.,

a ^ 1 = 1 3 2 i = 1 n 11 y i n 11 i = 1 n ee y i n ee + i = 1 n 22 y i n 22 (10)

a ^ e = 1 3 2 i = 1 n ee y i n ee i = 1 n 11 y i n 11 + i = 1 n 22 y i n 22 (11)

d ^ 1 e = i = 1 n 1 e y i n 1 e 1 2 i = 1 n 11 y i n 11 + i = 1 n ee y i n ee (12)

d ^ 2 e = i = 1 n 2 e y i n 2 e 1 2 i = 1 n 22 y i n 22 + i = 1 n ee y i n ee (13)

d ^ 12 = i = 1 n 12 y i n 12 1 2 i = 1 n 11 y i n 11 + i = 1 n 22 y i n 22 (14)

Each of these effects (10) – (14) can be tested by the log-likelihood ratio approach. For an epigenetic study, we are more interested in testing the epigenetic effect of the nucleotide site ae and dominant effects due to the interactions between the alleles and epiallele d1e and d2e. The log-likelihood ratio test statistics for each hypothesis test is thought of being asymptotically chi-square distributed with the degree of freedom equal to the difference in the number of parameters to be estimated between the alternative and null hypotheses.

Genetic and epigenetic variation

We first give the genetic variance explained by the nucleotide site studied prior to DNA methylation. By defining a new parameter called the average effect α = a + (qp)d[20], we derived the overall genetic variance of the trait due to this site as

σ g 2 = 2 p q α 2 + 2 p q d 2 σ a 2 + σ d 2 (15)

where σa2 = 2pqα2 is the additive genetic variance depending on both a and d, and σd2 = (2pqd)2 is the dominant genetic variance only depending on d. Both additive and dominance variances are affected by the relative magnitudes of allele frequencies p and q. These two variances reach their maximums when two alternative alleles A1 and A2 occur at the same frequency.

In what follows, we model how the epigenetic change contributes to the genetic variance of a complex trait based on the frequencies (1) and values of genotypes/epigenotypes (9). The total genetic variation among the six genotypes/epigenotypes is derived as

σ G 2 = a 1 2 1 u 2 p 2 + D 12 + D 1 e + a e 2 u 2 p 2 + D 1 e + D 2 e + a 1 + a e 2 q 2 + D 12 + D 2 e + 1 2 a 1 + a e + d 1 e 2 × 2 u 1 u p 2 2 D 1 e + 1 2 a 1 + d 2 e 2 × 2 u p q 2 D 2 e + 1 2 a e + d 12 2 × 2 1 u p q 2 D 12 m 2 (16)

where m is the population mean expressed as

m = a 1 1 u p q + a e u p q + 2 d 1 e u 1 u p 2 D 1 e + 2 d 2 e u p q D 2 e + 2 d 12 1 u p q D 12

It can be seen from equation (16) that the total genetic variance includes 15 different parts, i.e.,

σ G 2 = σ a 1 2 Additive effect of the original alleles prior to methylation + σ a e 2 Additive effect of the epiallele + σ d 1 e 2 Domiant effect between the original allele and epiallele + σ d 2 e 2 Domiant effect between the original allele and epiallele + σ d 12 2 Domiant effect between the original alleles + σ a 1 × a e 2 Multiplicative additive × additive effect involving the epiallele + σ a 1 × d 1 e 2 Multiplicative additive × dominant effect involving the epiallele + σ a 1 × d 2 e 2 Multiplicative additive × dominant effect involving the epiallele + σ a 1 × d 12 2 Multiplicative additive × dominant effect with no epiallele + σ a e × d 1 e 2 Multiplicative additive × dominant effect involving the epiallele + σ a e × d 2 e 2 Multiplicative additive × dominant effect involving the epiallele + σ a e × d 12 2 Multiplicative additive × dominant effect involving epiallele + σ d 1 e × d 2 e 2 Multiplicative dominant × additive effect involving the epiallele + σ d 1 e × d 12 2 Multiplicative additive × dominant effect involving the epiallele + σ d 2 e × d 12 2 Multiplicative additive × dominant effect involving the epiallele

Here, we define a new heritability, called the epigenetic heritability, which describes the proportion of the phenotypic variance explained by the effect of the epiallele and its interactions with the other effects, expressed as

H e 2 = σ G 2 σ a 1 2 σ d 12 2 σ a 1 × d 12 2 σ P 2 (17)

Also, we use the proportion of the epigenetic variance to the total genetic variance to describe the relative contribution of epigenetic methylation to the overall genetic variance, expressed as

R e 2 = σ G 2 σ a 1 2 σ d 12 2 σ a 1 × d 12 2 σ G 2 (18)

These two parameters can be used to assess the contribution of DNA methylation to the total phenotypic variation of a quantitative trait.

Numerical analysis

In this section, we performed numerical analyses to investigate how epigenetic marks contribute to the heritability of a complex trait. The occurrence of epigenetic marks is described by population genetic parameters including the occurrence rate of the epiallele and its Hardy-Weinberg disequilibria with unmarked alleles. The effect of epigenetic marks can be specified by quantitative genetic parameters including the epigenetic effect of the epiallele and its interactions with other effects. As analyzed above, population genetic parameters (p, q, u, D1e, D2e, D12) and quantitative genetic parameters (a1, ae, d1e, d2e, d12) contribute to the genetic variance in a complex way (16). We will analyze the contribution of epigenetic marks by separately investigating how these population and quantitative genetic parameters affect Re2.

Population genetic effect

Suppose there is a study population in which methylated sites are observed for a phenotypic trait. Consider a nucleotide site with two alleles A1 and A2, one of which, say A1, is methylated at a rate u (u takes any value in [0,1]). This methylation may violate the previous HWE assumption. Based on a simple algebraic analysis, we obtain the intervals of D1e, D2e and D12 as follows:

1 2 1 u 2 p 2 + u 2 q 2 + D 12 + D 2 e D 1 e 1 u p 2 1 2 u 2 p 2 + q 2 + D 1 e + D 12 D 2 e u p q 1 2 1 u 2 p 2 + q 2 + D 1 e + D 2 e D 12 1 u p q

Because of DNA methylation, the change of the genetic variance explained by the site takes place. By fixing quantitative genetic parameters, we quantitatively examined the impacts of different occurrence rates of methylation and different HWD coefficients on the epigenetic ariance. A small value of occurrence rate may lead to the formation of substantial epigenetic variance, although this phenomenon depends on the disequilibrium degree of association between two original alleles produced following methylation (Figure 1). The epigenetic variance is also positively associated with the degree of disequilibrium for the unmarked alleles and epiallele (Figure 2).

thumbnailFigure 1. Change of the proportion of the epigenetic variance over the total genetic variance (Re2) as a function of the occurrence rate of methylation in a natural population. The total and epigenetic genetic variances are calculated by assuming population genetic parameters (p, q, u, D1e, D2e, D12) ≡ (0.4, 0.6, u, 0.05, 0.05, D12) (allowing u and D12 to change) and quantitative genetic parameters (a1, ae, d1e, d2e, d12) ≡ (0.4, 0.05, 0.05, 0.05, 0.05).

thumbnailFigure 2. Change of the proportion of the epigenetic variance over the total genetic variance (Re2) as a function of Hardy-Weinberg disequilibrium (HED) coefficients formed between the original allele and epiallele in a natural population after DNA methylation. The total and epigenetic genetic variances are calculated by assuming population genetic parameters (p, q, u, D1e, D2e, D12) ≡ (0.4, 0.6, u, D1e, D2e, 0) (allowing u, D1e, and D2e to change) and quantitative genetic parameters (a1, ae, d1e, d2e, d12) ≡ (0.4, 0.05, 0.05, 0.05, 0.05).

Quantitative genetic effect

By fixing population genetic parameters, the influence of genetic effects triggered by the epiallele was investigated. A small value of the additive effect ae formed by the epiallele brings about considerable epigenetic variance (Figure 3). This influence increases with increasing ae values. The epigenetic variance is also remarkably affected by the dominant effect between the original alleles and epiallele (Figure 4). It is clear that these effect parameters contribute to the epigenetic variance also through their complex interactions.

thumbnailFigure 3. Change of the proportion of the epigenetic variance over the total genetic variance (Re2) as a function of the additive genetic effect due to the substitution of the original allele by the epiallele. The total and epigenetic genetic variances are calculated by assuming population genetic parameters (p, q, u, D1e, D2e, D12) ≡ (0.4, 0.6, 0.2, 0, 0, 0) and quantitative genetic parameters (a1, ae, d1e, d2e, d12) ≡ (a1, ae, 0.05, 0.05, 0.05) (allowing a1 and ae to change).

thumbnailFigure 4. Change of the proportion of the epigenetic variance over the total genetic variance (Re2) as a function of the dominant genetic effect due to the interaction between the original allele and epiallele. The total and epigenetic genetic variances are calculated by assuming population genetic parameters (p, q, u, D1e, D2e, D12) ≡ (0.4, 0.6, 0.2, 0.01, 0.01, 0) and quantitative genetic parameters (a1, ae, d1e, d2e, d12) ≡ (0.08, 0.12, d1e, d2e, d12) (allowing d1e, d2e and d12 to change).

Computer simulation

Our model allows the estimation and test of epigenetic effects. We carried out simulation studies to examine the statistical properties of the model. A study population was simulated by assuming a set of population and quantitative genetic parameters and a normally distributed residual error with mean zero and variance scaled under a range of trait heritabilities. As expected, the estimation precision increases with increasing sample size and heritability. A sample size 400 is sufficient to provide reasonable estimates of all population genetic parameters (Table 1). Note that the estimation precision of the population parameters does not rely on the size of heritability. In general, the reasonable estimation of quantitative genetic parameters, especially dominant genetic effects, needs a much larger sample size, say 1000 (Table 1). As expected, the estimation precision of genetic effects is sensitive to heritability. In practice, every effort should be given to precisely measure the phenotypic trait, aimed to increase the level of heritability.

Table 1. MLEs of population and quantitative genetic parameters from simulated data with different heritabilities (H2) and sample sizes (n)

We also investigated the power of detecting epiallelic HWD occurrence and epigenetic effects as well as the false positive rates for epigenetic effect identification under different heritabilities and sample sizes (Table 2). Given a medium sample size 400, the model possesses adequate power (> 0.95) for the detection of small epialleli HWD coefficients, along with small false positive rates (< 0.10). The power of the model to detect epigenetic effects was calculated by testing the hypothesis, H0: ae = d1e = d2e = 0 vs. H1: at least one of the effects in the H0 is not equal to zero, and comparing the resulting log-likelihood ratio test statistic with the critical threshold of a chi-square distribution with three degrees of freedom. The proportion of the number of simulation replicates that reject the null hypothesis over the total number of simulation replicates is empirically used as the power of the model. The power of epigenetic effect detection is very sensitive to the magnitude of the epigenetic effect, heritability and sample size (Table 2). When the epigenetic effect is small, the model has low power to detect it, although the power increases with increasing heritability and sample size. To detect a small epigenetic effect, a large sample size (2000 or more) is required for a precisely measured phenotype (with a large heritability). For a medium-size epigenetic effect, a sample size 1000 may be adequate for its detection if then phenotype is precisely measured. In general, the model has reasonably small false positive rates even for a medium sample size (Table 2).

Table 2. The power of epigenetic-effect detection by the epigenetic model and its false positive rates (FPR) under different sample sizes (n) and heritabilities (H2)

Implementing the epigenetic model into GWAS

The epigenetic model proposed can be implemented to genome-wide association studies (GWAS). In GWAS, it is likely that we have a million of methylated sites detected throughout the entire genome on a much smaller number of samples. Moreover, samples collected for human GWAS are highly heterogeneous in terms of genetic background, gender, age, race, and many other demographic characteristics. These demographic factors should be modeled as covariates. For a single methylated site, we can build a linear model to describe the phenotypic value of individual i by considering its multifactorial determinants, expressed as

y i = μ + ξ i 1 a 1 + ξ i 2 a e + ξ i 3 d 1 e + ξ i 4 d 2 e + ξ i 5 d 12 + r = 1 R α r u ir + s = 1 S l = 1 L s x isl v sl + e i (19)

where ξi1, …, ξi5 are the indicator variable for subject i that corresponds to a specific genetic or epigenetic effect at a methylated site, uir (r = 1, …, R) is the value of the rth continuous covariate, such as age and BMI, for subject i, αr is the effect of the rth continuous covariate, vsl (l = 1, …, Ls, s = 1, …, S) is the effect of the lth level for the sth discrete covariate, such as race, gender, and treatment, with ∑ l=1Lsυsl = 0 where Ls is the number of levels for the sth discrete covariate, xisl is an indicator variable of subject i who receives the lth level of the sth discrete covariate, and ei is a random error.

A standard multiple linear regression approach can be used to estimate all the effects described in model (19). If the test is made individually for each of the methylated sites, the significance of each effect should be adjusted by multiple comparison approaches such as Bonferroni or FDR.

Analysis of one single methylated site at a time is limited for statistical inference about a comprehensive picture of the genetic and epigenetic architecture of complex phenotypes. The best way such a picture is illustrated is to analyze all sites simultaneously. Li et al. [24] proposed a new approach by incorporating the least absolute shrinkage and selection operator (lasso) [25] to simultaneously analyze a larger number of variables using a much smaller sample size. A detailed algorithm for the Bayesian lasso has been derived [24] and can be readily implemented to GWAS aimed to identify epige-netic variants.


Epigenetic alternations have been increasingly recognized to play an important role in generating and maintaining quantitative genetic variation for complex phenotypes underlying physiology and diseased [6,7,9,26-28]. Preliminary estimates in plants suggest that it can account for up to 30% of the variation in commonly studied phenotypes such as height and flowering time [8]. Many theoretical models have been available to analyze the contributions of epigenetic marks to missing heritability in genome-wide association studies (GWAS) [14-18]. In this article, we extended Mendelian inheritance-based genetic principles to derive a quantitative framework by which to analyze the pattern of how DNA methylation contributes to overall genetic variance. By defining several epigenetic effect parameters, the analytical framework allows the mechanistic characterization of epigenetic actions within the quantitative genetic context.

Through numerical analysis, a small incidence of DNA methylation as well as a small effect due to methylation alternations could lead to a substantial increase of genetic variance, suggesting that epigenetic marks may be an important cause for genetic diversity in nature. Given our finding, the neglection of epigenetic variants in many current GWAS may partly explain the problem of missing heritability [17]. Simulation studies suggest that the model can provide reasonable estimates of epigenetic effect parameters with a sample size of 200 – 400, even when the trait studied has a small heritability. It should be pointed out, however, that this conclusion is based on a well-controlled study in which there are few background noises. For the GWAS in humans, the estimated genetic variation is likely to be confounded by many factors, such as population structure, heterogeneous genetic background, demographic complexity, and highly noisy phenotypic measurements among others. To remove these confounding effects from genetic and epigenetic analysis, a considerably large sample size may be needed.

The model only considers a single methylated site. However, there is no technical difficulty in extending the model to explore two or more sites at the same time which may interact with each other to produce a complex network of epistasis [29]. For two methylated sites, a total of 25 interaction parameters are formed between parameter sets each composed of (a1, ae, d1e, d2e, d12) for each site. In this case, an exponentially increasing sample size and more precise phenotypic measurement (aimed to increase the trait’s heritability) are needed. For the methylated population, originally existing HWE assumption may be violated in which case it is not possible to use gametic linkage disequilibria to specify the association between the two sites. Wu et al. [30] proposed a robust approach to analyze the marker-marker association by deriving a so-called zygotic linkage disequilibrium model. Wu et al.’s approach can be incorporated to identify the contribution of epigenetic marks at two sites to the overall genetic variance.

Epigenetic changes may be an adaptation to environmental perturbations [5,17,28]. Thus, it is crucial to incorporate the epigenetic model into a genotype-environment interaction study. By doing so, we can identify which and how epigenetic effects interact with the environment to determine final phenotypes so that the genetic etiology of quantitative variation can be better elucidated. In addition, there is a considerable body of evidence that epigenetic effects may transmitted from one generation to next [31,32], although other studies found the reprogramming of epigenetic effects during meiosis [5,33,34]. By embedding our epigenetic model into a family-based design, we can develop a powerful approach to test the relative importance of these two phenomena in trait control [35-37]. Traditional models analyze the inheritance of quantitative traits based on Mendel’s laws, failing to study the contribution of epigenetic modifications. In addition, many GWAS are based on a case–control study in which genotype frequencies are compared between two groups. To study the association between epigenetic effects and a particular disease, such as cancer, we can incorporate quantitative epigenetic models as described by equations (10) – (14) into a case–control framework, allowing each effect to be tested. The integration of general quantitative genetic models and a case–control design has been discussed and its statistical properties investigated through analytical derivations and computer simulations [38-40]. With these extensions, the new model proposed in this article by integrating traditional quantitative genetic theory and the latest discoveries of epigenetic effects will allow geneticists to chart a more comprehensive picture of the genetic landscape for complex phenotypes underlying agricultural production, physiology and human diseases.

Competing interests

The authors declare that there are no competing interests.

Authors’ contributions

ZW designed the algorithm and conducted the simulation experiments. ZHW derived the statistical model for hypothesis tests. JW participated in computer simulation. YHS JW participated in computer simulation. JZ provided biological insights for the statistical model. DL supervised the project. RW conceived of the model, designed the computer simulation and wrote the manuscript. All authors read and approved the final manuscript.


This work is partially supported by NSF/IOS-0923975, NIH/UL1RR0330184 and the Nantong “Jianghai Elites” program.


  1. Rutherford SL, Henikoff S: Quantitative epigenetics.

    Nat Genet 2003, 33:6-8. PubMed Abstract | Publisher Full Text OpenURL

  2. Richards EJ: Inherited epigenetic variation–revisiting soft inheritance.

    Nat Rev Genet 2006, 7:395-401. PubMed Abstract | Publisher Full Text OpenURL

  3. Richard EJ: Quantitative epigenetics: DNA sequence variation need not apply.

    Genes Dev 2009, 23:1601-1605. PubMed Abstract | Publisher Full Text OpenURL

  4. Richards EJ: Natural epigenetic variation in plant species: a view from the field.

    Curr Opin Plant Biol 2011, 14:204-209. PubMed Abstract | Publisher Full Text OpenURL

  5. Richards CL, Bossdorf O, Pigliucci M: What role does heritable epigenetic variation play in phenotypic evolution?

    Bioscience 2010, 60:232-237. Publisher Full Text OpenURL

  6. Feinberg AP: Phenotypic plasticity and the epigenetics of human disease.

    Nature 2007, 447:433-440. PubMed Abstract | Publisher Full Text OpenURL

  7. Feinberg AP, Irizarry RA: Stochastic epigenetic variation as a driving force of development, evolutionary adaptation, and disease.

    Proc Natl Acad Sci USA 2010, 107:1757-1764. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  8. Johannes F, Porcher E, Teixeira FK, Saliba-Colombani V, Simon M, Agier N, Bulski A, Albuisson J, Heredia F, Audigier P, Bouchez D, Dillmann C, Guerche P, Hospital F, Colot V: Assessing the impact of transgenerational epigenetic variation on complex traits.

    PLoS Genet 2009, 5:e1000530. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  9. Eichten SR, Swanson-Wagner RA, Schnable JC, Waters AJ, Hermanson PJ, Liu S, Yeh CT, Jia Y, Gendler K, Freeling M, Schnable PS, Vaughn MW, Springer NM: Heritable epigenetic variation among maize inbreds.

    PLoS Genet 2011, 7(11):e1002372. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  10. Johannes F, Colot V, Jansen RC: Epigenome dynamics: a quantitative genetics perspective.

    Nat Rev Genet 2008, 9:883-890. PubMed Abstract | Publisher Full Text OpenURL

  11. Maher B: Personal genomes: the case of the missing heritability.

    Nature 2008, 456:18-21. PubMed Abstract | Publisher Full Text OpenURL

  12. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, Cho JH, Guttmacher AE, Kong A, Kruglyak L, Mardis E, Rotimi CN, Slatkin M, Valle D, Whittemore AS, Boehnke M, Clark AG, Eichler EE, Gibson G, Haines JL, Mackay TF, McCarroll SA, Visscher PM: Finding the missing heritability of complex diseases.

    Nature 2009, 461:747-753. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  13. Eichler E, Flint J, Gibson G, Kong A, Leal S, Moore JH, Nadeau JH: Missing heritability and strategies for finding the underlying causes of complex disease.

    Nat Rev Genet 2010, 11:446-450. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  14. Slatkin M: Epigenetic inheritance and the missing heritability problem.

    Genetics 2009, 182:845-850. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  15. Tal O, Kisdi E, Jablonka E: Epigenetic contribution to covariance between relatives.

    Genetics 2010, 184:1037-1050. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  16. Johannes F, Colome-Tatche M: Quantitative epigenetics through epigenomic perturbation of isogenic lines.

    Genetics 2011, 188:215-227. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  17. Furrow RE, Christiansen FB, Feldman MW: Environment-sensitive epigenetics and the heritability of complex diseases.

    Genetics 2011, 189:1377-1387. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  18. Jaffe AE, Feinberg AP, Irizarry RA, Leek JT: Significance analysis and statistical dissection of variably methylated regions.

    Biostatistics 2012, 13:166-178. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  19. Roux F, Colome-Tatche M, Edelist C, Warenaar R, Guerche P, Hospital F, Colot V, Jansen RC, Johannes F: Genome-wide epigenetic perturbation jump-starts patterns of heritable variation found in nature.

    Genetics 2011, 188:1015-1017. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  20. Falconer DS, Mackay TFC: Introduction to Quantitative Genetics. London: Longman; 1996. OpenURL

  21. Lynch M, Walsh B: Genetics and Analysis of Quantitative Traits. Sunderland, MA: Sinauer Associates; 1998. OpenURL

  22. Self SG, Liang KY: Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions.

    J Am Stat Assoc 1987, 82:605-610. Publisher Full Text OpenURL

  23. Andrews DWK: Testing when a parameter is on the boundary of the maintained hypothesis.

    Econometrica 2001, 69:683-734. Publisher Full Text OpenURL

  24. Tibshirani R: Regression shrinkage and selction via the lasso.

    J R Stat Soc Ser B 1996, 58:267-288. OpenURL

  25. Li JH, Das K, Fu GF, Li RZ, Wu RL: The Bayesian lasso for genome-wide association studies.

    Bioinformatics 2011, 27:516-523. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  26. Feinberg AP, Tycko B: The history of cancer epigenetics.

    Nat Rev Cancer 2004, 4:143-153. PubMed Abstract | Publisher Full Text OpenURL

  27. Feinberg AP, Irizarry RA, Fradin D, Aryee MJ, Murakami P, et al.: Personalized epigenomic signatures that are stable over time and covary with body mass index.

    Sci Transl Med 2011, 3(65):65er1. OpenURL

  28. Petronis A: Epigenetics as a unifying principle in the aetiology of complex traits and diseases.

    Nature 2010, 465:721-727. PubMed Abstract | Publisher Full Text OpenURL

  29. Smith LM, Weigel D: On epigenetics and epistasis: hybrids and their non-additive interactions.

    EMBO J 2012, 31:249-250. OpenURL

  30. Wu S, Yang J, Wu RL: Genetic mapping of quantitative trait loci in a non-equilibrium population.

    Stat Appl Mol Genet Biol 2010, 9(1):32. OpenURL

  31. Reik W: The Wellcome Prize Lecture. Genetic imprinting: the battle of the sexes rages on.

    Exp Physiol 1996, 81:161-172. PubMed Abstract | Publisher Full Text OpenURL

  32. Reik W, Dean W, Walter J: Epigenetic reprogramming in mammalian development.

    Science 2001, 293:1089-1093. PubMed Abstract | Publisher Full Text OpenURL

  33. Youngson NA, Whitelaw E: Transgenerational epigenetic effects.

    Annu Rev Genomics Hum Genet 2008, 9:233-257. PubMed Abstract | Publisher Full Text OpenURL

  34. Whitelaw NC, Whitelaw E: Transgenerational epigenetic inheritance in health and disease.

    Curr Opin Genet Dev 2008, 18:273-279. PubMed Abstract | Publisher Full Text OpenURL

  35. Wang C, Wang Z, Luo J, Li Q, Li Y, Ahn K, Prows DR, Wu R: A model for transgenerational imprinting variation in complex traits.

    PLoS One 2010, 5(7):e11396. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  36. Wang CG, Wang Z, Prows DR, Wu RL: A computational framework for the inheritance of genomic imprinting for complex traits.

    Brief Bioinform 2012, 13:34-45. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  37. Li Y, Guo YQ, Hou W, Chang M, Liao LP, Wu RL: A statistical design for testing transgenerational genomic imprinting in natural human populations.

    PLoS One 2011, 6(2):e16858. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  38. Wang Z, Liu T, Lin Z, Hegarty J, Koltun WA, Wu R: A general model for multilocus epistatic interactions in case–control studies.

    PLoS One 2010, 5(8):e11384. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  39. Liu T, Thalamuthu A, Liu JJ, Chen C, Wang Z, Wu R: Asymptotic distribution for epistatic tests in case–control studies.

    Genomics 2011, 98:145-151. PubMed Abstract | Publisher Full Text OpenURL

  40. Zhang L, Liu R, Wang Z, Culver DA, Wu R: Modeling haplotype-haplotype interactions in case–control genetic association studies.

    Front Genet 2012, 3:2. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL