The genetic basis of blood pressure often involves multiple genetic factors and their interactions with environmental factors. Gene-environment interaction is assumed to play an important role in determining individual blood pressure variability. Older people are more prone to high blood pressure than younger ones and the risk may not display a linear trend over the life span. However, which gene shows sensitivity to aging in its effect on blood pressure is not clear. In this work, we allowed the genetic effect to vary over time and propose a varying-coefficient model to identify potential genetic players that show nonlinear response across different age stages. We detected 2 novel loci, gene MIR1263 (a microRNA coding gene) on chromosome 3 and gene UNC13B on chromosome 9, that are nonlinearly associated with diastolic blood pressure. Further experimental validation is needed to confirm this finding.
The genetic basis of a complex trait often involves multiple genetic factors functioning in a coordinated manner. The extent to which our genetic blueprint expresses also depends on the interactions between genetic and environmental factors. Increasing evidence shows the importance of gene-environment (G × E) interactions in determining the risk of a variety of diseases such as respiratory diseases , obesity , and psychiatric disorders . For a review of G × E interaction, see the work of Hunter . The empirical evidence underscores the importance of developing novel statistical approaches to identify major genetic players that are sensitive to environmental stimuli and to further understand how they function.
Blood pressure is a heritable trait influenced by several biological pathways sensitive to environmental stimuli. High blood pressure, or hypertension, affects more than 1 billion people worldwide. It damages an individual's body in many ways over time, leading to heart disease, stroke, kidney failure, and other health problems . Age is known to be a risk factor for high blood pressure. Systolic blood pressure rises with age, whereas the diastolic blood pressure tends to fall. For people with preexisting high blood pressure, this age-related pattern occurs even if the blood pressure is well controlled with medication . The reasons why blood pressure changes with age are still poorly understood, but are a topic of intense research. Thus, age should be an important predictor when searching for genetic players responsible for hypertension. However, few studies have considered an age-dependent mechanism in their analysis.
The genetic response to age in blood pressure fits in well with the classical G × E interaction framework. G × E interaction typically refers to the manner in which genotypes influence phenotypes differently in different environments . From a biological point of view, G × E interaction can be better viewed as the genetic responses to environment changes or stresses . Statistically, interaction is considered as a departure from additivity when fitting a linear regression model with 1 or more product terms, for example,
where Y is a quantitative trait (diastolic blood pressure in this analysis), G is the genetic variable, X is the environmental variable (age), and is the error term. This is a classical linear model for G × E interaction analysis. As can be seen, equation (1) automatically assumes a linear interaction mechanism between G and X because the coefficient for G is a linear function in X. However, the contribution of the same gene to blood pressure may be quite different at different age levels. This nonlinear penetrance can be well understood by a statistical varying-coefficient (VC) model . VC models allow the coefficients to change smoothly and nonlinearly with other variables so that one can explore the dynamic feature of a response over time with great flexibility and nice interpretability .
In this work, we applied VC models to detect genetic variants associated with diastolic blood pressure from the Genetic Analysis Workshop 18 (GAW18) data with 142 unrelated individuals. We allowed the contribution of genetic variants to blood pressure to vary over time via varying coefficients. We further proposed a sequence of hypothesis tests to evaluate whether the effect of a genetic variant is sensitive to aging, and if it is, is it in a linear or nonlinear fashion? Using this analysis, we identified 2 novel loci that show nonlinear effects over time to affect blood pressure.
The nonlinear VC model is defined as
for given and the response with and is the conditional variance function. The mean function is defined as where is a smoothing function in . Under the VC modeling framework, the effect of a gene is allowed to vary as a function of environmental factors, either linearly or nonlinearly, captured by the model itself. Thus, the VC model has the potential to dissect the nonlinear penetrance of genetic variants. Here we also allow nonlinear function of with modeled by . This nonlinear term adjusts the nonlinear effect of when estimating the nonlinear effect of . If we take equation (1) is just a special case of the VC model when .
The following list shows all 4 mean models involved in our analysis.
We first assess whether the genetic coefficients vary with × by formulating the following hypotheses,
Rejecting the null indicates that potential gene-age (G × age) interaction may exist. Otherwise, we conclude there is no G × age effect and we fit mean model 2 to test for association. Because the traditional linear interaction model given in equation (1) is a special case of the proposed VC model, we next test significance of a linear effect if the above null is rejected, by formulating the following hypotheses,
Failure to reject the null indicates that there is a linear G × age effect, so we fit mean model 3 to assess association. Otherwise, we conclude that the G × age interaction is nonlinear. We then assess the nonlinear genetic effect over age by formulating the hypotheses,
The rejection of the null indicates that the genetic effect is sensitive to age in a nonlinear fashion. The sequence of hypothesis tests stated above was suggested by Ma et al  for optimal power to detect association.
We fit the varying coefficients with a B-spline technique for both and functions. The variable was first transformed to make it more evenly distributed on each subinterval used in the B-spline smoothing technique. The great advantages of B-spline estimation over other nonparametric techniques are simple implementation and fast computing . For each single-nucleotide polymorphism (SNP), in mean model 1 was estimated by considering the following least square problem,
The estimated has the form where N is the number of interior knots, p is the degree of B-splines, and is the set of basis B-splines with degree p. For selecting the number of knots N and the degree p of the B-splines, we used the Bayesian information criterion (BIC),
The error term can be assumed homogeneous following a normal distribution or heterogeneous without assuming a parametric distribution. When the homogeneous assumption is made, the likelihood ratio test can be applied to assess the significance of . Otherwise can be nonparametrically estimated using the spline approximation and defining as the spline estimate, where minimizes that is, where and
Thus we have and Wild bootstrap can be applied to assess the significance of .
We applied the above models to the GAW18 genome-wide association data. We focused our analysis on diastolic blood pressure (DBP) to identify any genetic players that can explain the variability of DBP triggered by nonlinear genetic penetrance over time. We treated DBP as the response Y and age as the X variable. The genetic variable G is coded following an additive model, that is, G = 1, 0, −1, corresponding to genotype AA, Aa, aa, respectively. In total, 142 individuals and 388,099 SNPs were left after removing SNPs with a minor allele frequency less than 0.05. These SNPs are distributed on odd-numbered chromosomes from chromosome 1 to chromosome 21.
Figure 1 shows the Manhattan plots of p values when assessing significance by fitting different models. The overall p value patterns for the 3 models are quite similar. Two known and 1 unknown gene show strong nonlinear genetic effects (indicated by small p values in columns 7 and 8 in Table 1). A strong signal was detected in chromosome 3 containing a microRNA coding gene, MIR1263, and in chromosome 9 containing the gene UNC13B. MIR1263 may play a regulatory role. The signals at gene UNC13B are quite consistent for the 3 models. This gene was reported to be related with increased risk of nephropathy in patients with type 1 diabetes. Nephropathy accounts for 40% of end-stage renal disease and is associated with high cardiovascular morbidity and mortality .
Figure 1. Manhattan plots of the p values for assessing significance of:.A, by fitting mean model 2; (B) by fitting mean model 3; and (C) by fitting mean model 4. Solid red, blue, and gray lines correspond to significance levels of , , and , respectively.
Table 1. List of SNPs with p value <5 × 10−7
Table 1 lists SNPs with p values less than 5 × 10−7. These SNPs show strong nonlinear effects over time to affect DBP, especially for SNPs in chromosome 3 (indicated by small and in Table 1). These SNPs can be easily missed by fitting traditional linear models. For illustration purposes, Figure 2 shows the fitted mean DBP function and the genetic effects of 2 SNPs in genes MIR1263 and UNC13B. For SNP rs9863717 in gene MIR1263, DBP decreases after age 55 years for individuals carrying genotype GG, whereas it increases for individuals carrying genotype AA. Thus, for a senior person who carries genotype AA at this locus, the chance to develop hypertension is higher than for others. For SNP rs10972462 in gene UNC13B, large DBP variability among the 3 genotype groups is observed after age 50 years, and a decreasing pattern is observed roughly after age 65 years. Among the 3 genotype groups, DBP is higher in the GG group, followed by the GA and AA groups. From the prevention and therapeutic point of view, people carrying genotype GG at rs10972462 locus should pay special attention after age 50 years, and so should those carrying AA genotype at rs9863717 locus after age 65 years.
Figure 2. Fitted mean function and the estimated varying coefficients. Shown are effects for SNP rs9863717 in gene MIR1263 (chr3) and SNP rs10972462 in gene UNC13B (chr9). The observed data are shown in lighter color in the background.
Discussion and conclusions
In this work, we proposed to model the genetic effect as a nonlinear function of age. It is clear that the classical linear model, with or without interaction, is just a special case of the VC model. However, the VC model has the flexibility to capture potential nonlinear genetic effects over time. Evidence of nonlinear genetic effects has been reported previously. For example, Laitala et al  reported the curvilinear genetic effect on interindividual differences in coffee consumption over age. In a study of congenital scoliosis in mice , the authors found that mutations in genes HES7 and MESP2 are sensitive to different degrees of hypoxia, which is responsible for a nonlinear increase in the severity and penetrance of vertebral defects. Our analysis identified 2 novel loci associated with DBP with nonlinear genetic effects. They can be missed by the traditional linear interaction model. However, because statistical significance does not necessarily imply causality, further experimental validation is needed to confirm the finding.
As shown in Ma et al , the VC model loses power because of high degrees of freedom in the test in cases where the genetic effect is not very complex, such as in a linear form. Thus one should assess constant or linear effects first, followed by fitting the corresponding model suggested by the results of the tests. In this analysis, we found that the coefficients are constant for most SNPs.
Note that the function models the overall mean of DBP over time when there is no genetic effect. When a linear structure for is forced, we observe inflated signals for testing . Thus, the incorporation of this nonlinear function can largely reduce false positives. In this analysis we coded the genetic variable G in an additive fashion, although other disease models such as dominant or recessive can also be assumed, while the optimal one can be selected based on a model selection criterion such as BIC.
The authors declare that they have no competing interests.
HLW conducted the analysis and drafted the manuscript; TH, CW, and PSZ participated the discussion and helped with the analysis; YHC conceived the idea and wrote the manuscript. All authors read and approved the final manuscript.
We greatly appreciate the GAW18 data provider and the National Institutes of Health grant that supports GAW18. We also wish to thank the anonymous reviewer for the insightful comments that greatly improved the manuscript. This work was partially supported by National Science Foundation grants DMS-1209112 and IOS-1237969. The GAW18 whole genome sequence data were provided by the T2D-GENES Consortium, which is supported by NIH grants U01 DK085524, U01 DK085584, U01 DK085501, U01 DK085526, and U01 DK085545. The other genetic and phenotypic data for GAW18 were provided by the San Antonio Family Heart Study and San Antonio Family Diabetes/Gallbladder Study, which are supported by NIH grants P01 HL045222, R01 DK047482, and R01 DK053889. The Genetic Analysis Workshop is supported by NIH grant R01 GM031575.
This article has been published as part of BMC Proceedings Volume 8 Supplement 1, 2014: Genetic Analysis Workshop 18. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcproc/supplements/8/S1. Publication charges for this supplement were funded by the Texas Biomedical Research Institute.
Am Nat 1952, 86:293-298. Publisher Full Text
Ann Stat 1993, 21:1926-1947. Publisher Full Text
Sparrow DB, Chapman G, Smith AJ, Mattar MZ, Major JA, O'Reilly VC, Saga Y, Zackai EH, Dormans JP, Alman BA, et al.: A mechanism for gene-environment interaction in the etiology of congenital scoliosis.