Siyang Science and Technology Station, Yuanpeng Institute of Genome, Nantong, Jiangsu, 226019, China

Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA

Division of Biostatistics, Yale University, New Haven, CT, 06510, USA

Center for Computational Biology, Beijing Forestry University, Beijing, 100083, China

Abstract

Background

Despite our increasing recognition of the mechanisms that specify and propagate epigenetic states of gene expression, the pattern of how epigenetic modifications contribute to the overall genetic variation of a phenotypic trait remains largely elusive.

Results

We construct a quantitative model to explore the effect of epigenetic modifications that occur at specific rates on the genome. This model, derived from, but beyond, the traditional quantitative genetic theory that is founded on Mendel’s laws, allows questions concerning the prevalence and importance of epigenetic variation to be incorporated and addressed.

Conclusions

It provides a new avenue for bringing chromatin inheritance into the realm of complex traits, facilitating our understanding of the means by which phenotypic variation is generated.

Background

Systematic or stochastic changes in chromatin states, such as DNA methylation, chromatin remodeling, histone modification and RNA interference, have been thought to provide an additional driving force for phenotypic variation in complex traits and diseases

There have been several publications on methodological development for epigenetic detection

Despite these advances, we are still unclear how much of the phenotypic variation is contributed by epigenetic modifications and, more importantly, through which way epialleles trigger their effects on phenotypic values. The motivation of this article is to develop a quantitative model for estimating and testing the contribution of epigenetic variants to quantitative trait variation. The model allows the prediction of how much genetic variation is produced through a change in the rate of occurrence of epigenetic mutation and the effect of epigenetic factors in a natural population. We particularly discuss how the epigenetic effect interacts with other genetic effects, such as additive and dominant, to affect phenotypic traits. By implementing it into genome-wide association studies

Model

Occurrence rate of methylation

Consider an epigenetic study population of _{1} and _{2}, is thought to affect a phenotypic trait. Let _{1} and _{2} in the natural population at Hardy-Weinberg equilibrium (HWE), respectively. The genotypic frequencies of _{1}
_{1}, _{1}
_{2}, and _{2}
_{2} at the nucleotide site studied are expressed as ^{2}, 2^{2}, respectively

At the nucleotide site studied, some cytosines within a CpG dinucleotide are methylated by adding a methyl group to the 5 position of the cytosine pyrimidine ring. With no loss of generality, allele _{1} is a cytosine which is, if any, methylated into a new “allele” called the epiallele, denoted as _{
e
}, at a rate _{1} allele, epiallele _{
e
} and allele _{2} are (1 –

where _{12}, _{1e
}, and _{2e
} are the coefficients of Hardy-Weinberg disequilibrium (HWD) due to a non-random association between alleles _{1} and _{2}, between allele _{1} and epiallele _{
e
}, and between allele _{2} and epiallele _{
e
}, respectively. It is possible that the previous equilibrium of the population is violated by DNA methylation, leading to the HWD quantified by _{12}, _{1e
}, and _{2e
}. Thus, the genotype and epigenotype frequencies may be determined by allele and epiallele frequencies and HWD coefficients.

Let _{11}, _{1e
}, _{
ee
}, _{12}, _{2e
}, and _{22} (_{11}+_{1e
}+_{
ee
}+_{12}+_{2e
}+_{22} =

We are interested in investigating whether there is significant occurrence of DNA methylation at the nucleotide site. This can be tested by formulating a null hypothesis, H_{0}: _{0}: _{0} and L_{1}) are calculated, respectively. However, because the _{0} lies on the boundary of parameter space, the log-likelihood ratio calculated,

may not follow a standard chi-square distribution. Self and Liang

Similar tests can be performed for individual HWD, _{1e
}, _{2e
}, or _{12}, or their combinations, by formulating the null hypotheses, respectively. Under the alternative hypothesis H_{1} associated with each null hypothesis considered, the likelihood is calculated. The LR value calculated is thought to be asymptotically chi-square distributed with the degree of freedom equal to the difference in the number of parameters to be estimated between the alternative and null hypotheses.

Genetic and epigenetic effect

We assume that the study population is investigated under a uniform condition so that the phenotypic variation can be simply partitioned into genetic/epigenetic components and errors. There are only three genotypes, _{1}
_{1}, _{1}
_{2}, and _{2}
_{2}, prior to DNA methylation. Let _{1} by _{2} or vice versa and

As described above, allele _{1} is assumed to be methylated into the epiallele _{
e
}. The values of six distinguishable genetic and epigenetic types are expressed as

where the genotypic value of the trait is decomposed into different components, i.e., the overall mean (_{1} (_{1}) and epiallele _{
e
} by allele _{2} (_{
e
}), and the dominance effects due to the interaction between allele _{1} and epiallele _{
e
} (_{1e
}), between allele _{1} and allele _{2} (_{12}) and between allele _{2} and epiallele _{
e
} (_{2e
}).

Let _{
i
} denote the phenotypic value of the trait for individual

Each of these effects (10) – (14) can be tested by the log-likelihood ratio approach. For an epigenetic study, we are more interested in testing the epigenetic effect of the nucleotide site _{
e
} and dominant effects due to the interactions between the alleles and epiallele _{1e
} and _{2e
}. The log-likelihood ratio test statistics for each hypothesis test is thought of being asymptotically chi-square distributed with the degree of freedom equal to the difference in the number of parameters to be estimated between the alternative and null hypotheses.

Genetic and epigenetic variation

We first give the genetic variance explained by the nucleotide site studied prior to DNA methylation. By defining a new parameter called the average effect

where _{
a
}
^{2} = 2^{2} is the additive genetic variance depending on both _{
d
}
^{2} = (2^{2} is the dominant genetic variance only depending on _{1} and _{2} occur at the same frequency.

In what follows, we model how the epigenetic change contributes to the genetic variance of a complex trait based on the frequencies (1) and values of genotypes/epigenotypes (9). The total genetic variation among the six genotypes/epigenotypes is derived as

where

It can be seen from equation (16) that the total genetic variance includes 15 different parts, i.e.,

Here, we define a new heritability, called the epigenetic heritability, which describes the proportion of the phenotypic variance explained by the effect of the epiallele and its interactions with the other effects, expressed as

Also, we use the proportion of the epigenetic variance to the total genetic variance to describe the relative contribution of epigenetic methylation to the overall genetic variance, expressed as

These two parameters can be used to assess the contribution of DNA methylation to the total phenotypic variation of a quantitative trait.

Numerical analysis

In this section, we performed numerical analyses to investigate how epigenetic marks contribute to the heritability of a complex trait. The occurrence of epigenetic marks is described by population genetic parameters including the occurrence rate of the epiallele and its Hardy-Weinberg disequilibria with unmarked alleles. The effect of epigenetic marks can be specified by quantitative genetic parameters including the epigenetic effect of the epiallele and its interactions with other effects. As analyzed above, population genetic parameters (_{1e
}, _{2e
}, _{12}) and quantitative genetic parameters (_{1}, _{
e
}, _{1e
}, _{2e
}, _{12}) contribute to the genetic variance in a complex way (16). We will analyze the contribution of epigenetic marks by separately investigating how these population and quantitative genetic parameters affect _{
e
}
^{2}.

Population genetic effect

Suppose there is a study population in which methylated sites are observed for a phenotypic trait. Consider a nucleotide site with two alleles _{1} and _{2}, one of which, say _{1}, is methylated at a rate _{1e
}, _{2e
} and _{12} as follows:

Because of DNA methylation, the change of the genetic variance explained by the site takes place. By fixing quantitative genetic parameters, we quantitatively examined the impacts of different occurrence rates of methylation and different HWD coefficients on the epigenetic ariance. A small value of occurrence rate may lead to the formation of substantial epigenetic variance, although this phenomenon depends on the disequilibrium degree of association between two original alleles produced following methylation (Figure

Change of the proportion of the epigenetic variance over the total genetic variance (_{e}^{2}) as a function of the occurrence rate of methylation in a natural population

**Change of the proportion of the epigenetic variance over the total genetic variance (**_{e}^{2}**) as a function of the occurrence rate of methylation in a natural population.** The total and epigenetic genetic variances are calculated by assuming population genetic parameters (_{1e}, _{2e}, _{12}) ≡ (0.4, 0.6, _{12}) (allowing _{12} to change) and quantitative genetic parameters (_{1}, _{e}, _{1e}, _{2e}, _{12}) ≡ (0.4, 0.05, 0.05, 0.05, 0.05).

Change of the proportion of the epigenetic variance over the total genetic variance (_{e}^{2}) as a function of Hardy-Weinberg disequilibrium (HED) coefficients formed between the original allele and epiallele in a natural population after DNA methylation

**Change of the proportion of the epigenetic variance over the total genetic variance (**_{e}^{2}**) as a function of Hardy-Weinberg disequilibrium (HED) coefficients formed between the original allele and epiallele in a natural population after DNA methylation.** The total and epigenetic genetic variances are calculated by assuming population genetic parameters (_{1e}, _{2e}, _{12}) ≡ (0.4, 0.6, _{1e}, _{2e}, 0) (allowing _{1e}, and _{2e} to change) and quantitative genetic parameters (_{1}, _{e}, _{1e}, _{2e}, _{12}) ≡ (0.4, 0.05, 0.05, 0.05, 0.05).

Quantitative genetic effect

By fixing population genetic parameters, the influence of genetic effects triggered by the epiallele was investigated. A small value of the additive effect _{
e
} formed by the epiallele brings about considerable epigenetic variance (Figure _{
e
} values. The epigenetic variance is also remarkably affected by the dominant effect between the original alleles and epiallele (Figure

Change of the proportion of the epigenetic variance over the total genetic variance (_{e}^{2}) as a function of the additive genetic effect due to the substitution of the original allele by the epiallele

**Change of the proportion of the epigenetic variance over the total genetic variance (**_{e}^{2}**) as a function of the additive genetic effect due to the substitution of the original allele by the epiallele.** The total and epigenetic genetic variances are calculated by assuming population genetic parameters (_{1e}, _{2e}, _{12}) ≡ (0.4, 0.6, 0.2, 0, 0, 0) and quantitative genetic parameters (_{1}, _{e}, _{1e}, _{2e}, _{12}) ≡ (_{1}, _{e}, 0.05, 0.05, 0.05) (allowing _{1} and _{e} to change).

Change of the proportion of the epigenetic variance over the total genetic variance (_{e}^{2}) as a function of the dominant genetic effect due to the interaction between the original allele and epiallele

**Change of the proportion of the epigenetic variance over the total genetic variance (**_{e}^{2}**) as a function of the dominant genetic effect due to the interaction between the original allele and epiallele.** The total and epigenetic genetic variances are calculated by assuming population genetic parameters (_{1e}, _{2e}, _{12}) ≡ (0.4, 0.6, 0.2, 0.01, 0.01, 0) and quantitative genetic parameters (_{1}, _{e}, _{1e}, _{2e}, _{12}) ≡ (0.08, 0.12, _{1e}, _{2e}, _{12}) (allowing _{1e}, _{2e} and _{12} to change).

Computer simulation

Our model allows the estimation and test of epigenetic effects. We carried out simulation studies to examine the statistical properties of the model. A study population was simulated by assuming a set of population and quantitative genetic parameters and a normally distributed residual error with mean zero and variance scaled under a range of trait heritabilities. As expected, the estimation precision increases with increasing sample size and heritability. A sample size 400 is sufficient to provide reasonable estimates of all population genetic parameters (Table

**True**

**
H
**

**
H
**

**
H
**

**MLE SD**

**MLE SD**

**MLE SD**

The MLEs of parameters and their standard deviation (in parentheses) were calculated from 200 simulation replicates.

0.1

0.099 (0.019)

0.100 (0.017)

0.099 (0.020)

0.4

0.399 (0.020)

0.400 (0.022)

0.403 (0.018)

_{12}

0.01

0.011 (0.010)

0.008 (0.011)

0.009 (0.012)

_{1e}

0.01

0.010 (0.003)

0.010 (0.003)

0.010 (0.003)

_{2e}

0.01

0.010 (0.004)

0.010 (0.005)

0.010 (0.004)

1

1.002 (0.096)

1.009 (0.066)

1.002 (0.043)

_{1}

0.2

0.201 (0.113)

0.193 (0.085)

0.198 (0.049)

_{
e
}

0.05

0.060 (0.181)

0.064 (0.134)

0.054 (0.080)

_{12}

0.05

0.050 (0.076)

0.049 (0.055)

0.049 (0.032)

_{1e
}

0.05

−0.015 (0.485)

−0.008 (0.401)

0.010 (0.267)

_{2e
}

0.05

0.027 (0.279)

0.047 (0.171)

0.042 (0.116)

0.1

0.101 (0.014)

0.101 (0.013)

0.102 (0.013)

0.4

0.401 (0.012)

0.400 (0.012)

0.401 (0.011)

_{12}

0.01

0.010 (0.007)

0.009 (0.007)

0.010 (0.007)

_{1e
}

0.01

0.010 (0.003)

0.010 (0.002)

0.010 (0.002)

_{2e
}

0.01

0.010 (0.003)

0.010 (0.003)

0.010 (0.003)

1

1.003 (0.053)

0.996 (0.039)

1.000 (0.027)

_{1}

0.2

0.202 (0.067)

0.207 (0.049)

0.200 (0.031)

_{
e
}

0.05

0.051 (0.099)

0.037 (0.068)

0.050 (0.050)

_{12}

0.05

0.051 (0.048)

0.047 (0.035)

0.051 (0.023)

_{1e
}

0.05

0.056 (0.269)

0.046 (0.191)

0.038 (0.122)

_{2e
}

0.05

0.048 (0.158)

0.057 (0.111)

0.054 (0.081)

We also investigated the power of detecting epiallelic HWD occurrence and epigenetic effects as well as the false positive rates for epigenetic effect identification under different heritabilities and sample sizes (Table _{0}: _{
e
} = _{1e
} = _{2e
} = 0 vs. H_{1}: at least one of the effects in the H_{0} is not equal to zero, and comparing the resulting log-likelihood ratio test statistic with the critical threshold of a chi-square distribution with three degrees of freedom. The proportion of the number of simulation replicates that reject the null hypothesis over the total number of simulation replicates is empirically used as the power of the model. The power of epigenetic effect detection is very sensitive to the magnitude of the epigenetic effect, heritability and sample size (Table

**
n
**

**
ae
**

**
H
**

**
H
**

**
H
**

Power

400

0.05

0.055

0.090

0.160

1000

0.05

0.090

0.125

0.355

2000

0.05

0.115

0.205

0.630

5000

0.05

0.215

0.415

0.975

1000

0.1

0.085

0.255

0.780

2000

0.1

0.265

0.525

0.975

5000

0.1

0.495

0.950

1.00

FPR

400

0.05

0.050

0.045

0.065

1000

0.05

0.060

0.025

0.045

2000

0.05

0.030

0.010

0.045

5000

0.05

0.085

0.050

0.070

1000

0.1

0.045

0.040

0.040

2000

0.1

0.055

0.025

0.030

5000

0.1

0.050

0.020

0.045

Implementing the epigenetic model into GWAS

The epigenetic model proposed can be implemented to genome-wide association studies (GWAS). In GWAS, it is likely that we have a million of methylated sites detected throughout the entire genome on a much smaller number of samples. Moreover, samples collected for human GWAS are highly heterogeneous in terms of genetic background, gender, age, race, and many other demographic characteristics. These demographic factors should be modeled as covariates. For a single methylated site, we can build a linear model to describe the phenotypic value of individual

where _{
i1}, …, _{
i5} are the indicator variable for subject _{
ir
} (_{
r
} is the effect of the _{
sl
} (_{
s
}
_{
l=1}
^{
Ls
}
_{
s
} is the number of levels for the _{
isl
} is an indicator variable of subject _{
i
} is a random error.

A standard multiple linear regression approach can be used to estimate all the effects described in model (19). If the test is made individually for each of the methylated sites, the significance of each effect should be adjusted by multiple comparison approaches such as Bonferroni or FDR.

Analysis of one single methylated site at a time is limited for statistical inference about a comprehensive picture of the genetic and epigenetic architecture of complex phenotypes. The best way such a picture is illustrated is to analyze all sites simultaneously. Li et al.

Discussion

Epigenetic alternations have been increasingly recognized to play an important role in generating and maintaining quantitative genetic variation for complex phenotypes underlying physiology and diseased

Through numerical analysis, a small incidence of DNA methylation as well as a small effect due to methylation alternations could lead to a substantial increase of genetic variance, suggesting that epigenetic marks may be an important cause for genetic diversity in nature. Given our finding, the neglection of epigenetic variants in many current GWAS may partly explain the problem of missing heritability

The model only considers a single methylated site. However, there is no technical difficulty in extending the model to explore two or more sites at the same time which may interact with each other to produce a complex network of epistasis _{1}, _{
e
}, _{1e
}, _{2e
}, _{12}) for each site. In this case, an exponentially increasing sample size and more precise phenotypic measurement (aimed to increase the trait’s heritability) are needed. For the methylated population, originally existing HWE assumption may be violated in which case it is not possible to use gametic linkage disequilibria to specify the association between the two sites. Wu et al.

Epigenetic changes may be an adaptation to environmental perturbations

Competing interests

The authors declare that there are no competing interests.

Authors’ contributions

ZW designed the algorithm and conducted the simulation experiments. ZHW derived the statistical model for hypothesis tests. JW participated in computer simulation. YHS JW participated in computer simulation. JZ provided biological insights for the statistical model. DL supervised the project. RW conceived of the model, designed the computer simulation and wrote the manuscript. All authors read and approved the final manuscript.

Acknowledgements

This work is partially supported by NSF/IOS-0923975, NIH/UL1RR0330184 and the Nantong “Jianghai Elites” program.