Abstract
Background
The data arising from a longitudinal familial study have a complex correlation structure that cannot be modeled using classical methods for the analysis of familial data at a single time point.
Methods
To fit the longitudinal systolic blood pressure (SBP) pedigree data arising from the Framingham Heart Study, we proposed to use multilevel modeling. That approach was used to distinguish multiple levels of information with individual repeated measurements (Level 1) being made within individuals (Level 2), and individuals clustered within pedigrees (Level 3). Residuals from the subjectspecific and pedigreespecific regression models were summed both for the mean SBP and slope of SBP change over time, in order to define two new outcomes that were then used in a genomewide linkage analysis.
Results
Evidence for linkage for the two outcomes (mean SBP and slope) was found in several chromosomal regions with a maximum LOD score of 3.6 on chromosome 8 and 3.5 on chromosome 17 for the mean SBP, and 2.5 on chromosome 1 for SBP slope. However, the linkage on chromosome 8 was only detected when the sample was restricted to subjects between age 25 and 75 and with at least four exams (Cohort 1) or 3 exams (Cohort 2).
Discussion
Multilevel modeling is a powerful approach to detect genes involved in complex traits when longitudinal data are available. It allows for complex hierarchical data structure to be taken into account and therefore, a better partitioning of random withinindividual variation from other sources of variability (genetic or nongenetic).
Background
The Framingham Heart Study provides longterm repeated measurements of blood pressure and other phenotypes in two large cohorts of related individuals. Longitudinal studies are efficient designs for the investigation of individual changes over time. In the context of familial studies, such designs might be of particular interest to assess the proportion of the trait variability explained by withinindividual variation or other sources of variation. However, the data arising from a longitudinal familial study have a complex correlation structure that cannot be modeled using classical methods for the analysis of familial data at a single time point. In this study, we proposed to use multilevel modeling to fit the complex data structure arising from the Framingham Heart Study. Multilevel modeling, also known as hierarchical regression, generalizes ordinary regression modeling to distinguish multiple levels of information in a model [1]. It might be appropriate to model the Framingham Heart Study data that form a natural hierarchy with individual repeated measurements (Level 1) being made within individuals (Level 2), and individuals clustered within pedigrees (Level 3). The use of appropriate random effects at each level allows one to adjust for the influence of a wide variety of correlation structures and to estimate variance, covariance, and correlation which are of particular interest in familial studies. In this paper, multilevel models are first used to fit the repeated systolic blood pressure (SBP) measurements. Residuals from the subjectspecific and pedigreespecific regression models were summed both for the mean SBP and slope of SBP change over time, in order to define two new outcomes that were then used in a genomewide linkage analysis. Both phenotypes are of interest because genes involved in the variation of SBP with time could differ from genes affecting longterm mean SBP.
Methods
Data
The Framingham Heart Study data includes 330 pedigrees originally selected for a genomescan analysis. The pedigrees consisted of 4692 subjects, of whom 2885 have participated in the Framingham Heart Study. Longitudinal SBP data were analyzed for 25,263 examinations on 2662 individuals. Height, weight, gender, age, and hypertensive treatment information were required but if height was missing, the most recent measurement was imputed. Because there might be important variation in individual SBP measurement among younger and older subjects, we also restricted the sample to individuals aged between 25 and 75 years, as in Levy et al. [2]. The following selection criteria were also defined: 1) There had to be at least 10 years between a subject's initial and final examinations within the age range; 2) at least four examinations within the age range were required for the original cohort and at least three for offspring cohort participants [2]. Data from 24,840 examinations on 2530 individuals were available in the selected sample. For the genomewide scan analysis, 1702 genotyped individuals were included (394 from the Cohort 1 and 1308 from the Cohort 2).
Multilevel analysis of the longitudinal SBP model
Let the random variable Y_{ijk }denote the SBP measurement at the i^{th }examination for the j^{th }individual in pedigree k. We then assume that Y_{ijk }satisfies the following general multilevel model:
Withinsubject model – Level 1
where i = 1,...,21 for Cohort 1 subjects and i = {11, 15, 17, 19, 21} for Cohort 2 subjects. Age_{ijk}, BMI_{ijk}, Treat_{ijk }are the age, body mass index and hypertension treatment (1 for subjects treated and 0 for subjects untreated) at the i^{th }exam for the j^{th }individual in pedigree k, and are the mean values across all exams for the j^{th }individual, and ε_{ijk }are the error components that account for the withinindividual variability. The ε_{ijk }are assumed to be normally distributed with mean vector zero and variancecovariance matrix Σ defined by a firstorder autoregressive structure. The intercept b_{0jk }represents the average SBP for an untreated subject of average age and BMI across all of the subject's examinations. The regression coefficient b_{1jk }is used to model the linear variation of SBP with age. We found that every individual profile could be well approximated by a quadratic function of time, measured by the age at examination. We also tested a cubic effect, but it was not significant when we allowed for the individual's linear time trend to differ in each treatment group (interaction between age and treatment). Random effects were added to reflect the natural heterogeneity in the population. In this model, both the intercept and the linear effect for age were allowed to vary across individuals and the individualspecific regression coefficients (random effects) were defined at the second level:
Subject randomintercept model – Level 2
Subject randomslope model – Level 2
and are the sample means for age and body mass index, Sex and Cohort are two indicator variables, coded 1 for males, 0 for females and 0 for Cohort 1 subjects, 1 for Cohort 2 subjects. The random components u_{0jk }and u_{1jk }measure the variation of each individual's mean SBP and slope from their average in pedigree k. The intercept b_{00k }represents the average SBP in pedigree k for males in Cohort 1 with average age and BMI and the intercept b_{10k }represents the average slope in pedigree k for males in Cohort 1 with average BMI. To account for the correlation of individuals within a pedigree, these two intercepts were allowed to vary between pedigrees. The random effects at different levels of the model are assumed independent.
Pedigree randomintercept model – Level 3
b_{00k }= β_{000 }+ v_{00k}, k = 1,...,N
Pedigree randomslope model – Level 3
The random components v_{00k }and v_{01k }measure the variation of each pedigree's mean SBP and mean slope from their average in the whole sample.
Statistical tests in the multilevel model
Analyses were conducted in both the unselected and selected samples and with and without adjustment for BMI. Multilevel models were fitted using SAS PROC MIXED [3]. Parameter estimates are obtained by restricted maximum likelihood estimation (REML). An Fstatistic was used to test the significance of the fixed effects with number of degrees of freedom computed using the containment method [4]. The likelihood ratio statistic based on REML likelihoods was used to test the significance of the random effects. The null distribution of this statistic is a mixture of and with equal weights 0.5, where q and q + 1 are the number of random effects estimated under H_{0 }and H_{1}, respectively.
Genomewide linkage analysis
We used the estimates of the random effects at the subject and pedigree levels to define two new outcomes that were used in the genomewide linkage analysis. The two outcomes were defined as and , which measure the random variation of each individual's SBP mean and slope, respectively, from the sample average after adjustment for the fixed effects. A third outcome was also defined using the residuals from a samplewide regression in which each individual's mean SBP (across all exams) was regressed on his mean age (centered), mean BMI (centered), gender and cohort, as in Levy et al.'s paper [2]. Estimation of heritability and twopoint linkage analyses were performed on the pedigree data using the variance component models implemented in the SOLAR package [5].
Results
Multivariate analysis of longitudinal SBP
All fixed effects included in the model were highly significant in the subject random slope model (Table 1) except for gender. Most of the SBP variability (316.8 in Model 1a, Table 1) was explained by withinsubject (140.8, 44%) and betweensubject (146.2, 46%) variability in the mean SBP and to a lesser extent by betweenpedigree variability (27.6, <9%). Much less variability was explained by variability in the slope (0.17+0.008, <0.06%). Pedigree effects of mean SBP and SBP slope were more significant when the multilevel analyses were adjusted for body mass index. As shown in Figure 1, the multilevel model fit well the data while the samplewide regression does not capture all the SBP variability.
Table 1. Estimates of multilevel model fixed effects and random effects variances (± SE) in the selected and unselected samples with or without adjustment for BMI
Heritability
Heritability estimates were 54.3% (SE = 3.1) and 55.6% (SE = 3.4) for the mean SBP, 31.9% (SE = 3.5) and 28.9% (SE = 3.5) for SBP slope over time in the unselected and selected samples, respectively. The heritability estimates for the subjectspecific residuals from the samplewide regression of the mean SBP were 47.7% (SE = 3.4) and 49.7% (SE = 3.8) in the unselected and selected samples, respectively.
Genomewide linkage analysis
Evidence for linkage for the two outcomes (mean SBP and slope) was found in several chromosomal regions with a maximum LOD score of 3.6 on chromosome 8 and 3.5 on chromosome 17 for the mean SBP and 2.5 on chromosome 1 for SBP slope (Table 2). However, linkage on chromosome 8 for the mean SBP was only found in the selected sample. The decrease in LOD score in the unselected sample on chromosome 17 was important in several pedigrees that included individuals with a single SBP measurement, as illustrated in Figure 2. Adjusting the analyses for BMI showed stronger evidence for linkage, which could suggest that BMI is determined by other genetic factors (Table 3). Not adjusting the analysis for treatment effect did not change the results of the mean SBP, but yielded lower LOD scores for SBP slope (Table 3).
Discussion
Our study demonstrates the value of multilevel modeling in the search for genetic determinants of complex traits when longitudinal pedigree data are available. For the mean SBP, we were able to replicate the linkage result on chromosome 17 previously reported by Levy et al. [2] and detect a new linkage on chromosome 8 that was not reported before. For SBP slope, we also found suggestive results for linkage for both mean SBP and SBP slope on several other chromosomal regions, including chromosomes 1, 2, 3, 11, and 13. Using residuals from the multilevel model in a genomewide linkage analysis gave stronger evidence for linkage than using residuals from a samplewide regression as in the Levy et al.'s paper [2]. This might be because this latter approach does not correctly account for withinindividual and betweenindividual variability. Multilevel modeling, which can take into account the hierarchical structure of the data, may help disentangle the proportion of the trait variability explained by fundamental variation in the mean SBP and in the SBP slope from the proportion explained by random withinindividual variability. A more general hierarchical structure could have included a nuclear family level nested within the pedigree level. However, such a multilevel model would be more difficult to fit. In our analysis we only included a fixed cohort effect that could account for differences between generations within a pedigree. Treating the pedigrees as random effects also allowed for betweenpedigree heterogeneity in our model, which improved the accuracy of the random effect estimates at the individual level. Although there may be some concern about using a twostage approach for detecting linkage, other studies based on similar strategies using linear mixed models in simulated data did not report an inflation of type I error for the test of linkage in the context genomewide linkage analysis [6,7]. The linkage on chromosome 17 for mean SBP was only found in the selected sample. A important decrease in LOD score (>0.1) in the unselected sample was observed in several pedigrees comprising individuals with a single extreme SBP measurement, as illustrated in Figure 2. This suggests that a single SBP measurement may not provide a reliable characterization for an individual, especially when a familial study of SBP is designed. Adjusting the analyses for BMI showed stronger evidence for linkage, which could suggest that BMI is determined by other genetic factors. No correction was applied to the SBP value of subjects who received a hypertensive treatment. The analyses with the multilevel model were adjusted for treatment effect so that the residuals obtained from this model correspond to the untreated group. Taking into account an interaction between age and treatment in the multilevel model may also have reduced the bias due to treatment effect. However, our linkage results were insensitive to whether the analyses were adjusted for treatment effect. The multilevel modeling approach is also known to be robust to missing data, under the assumption that they are missing at random [4]. Future work could include the development of an integrated approach to perform linkage analysis within the multilevel framework.
Acknowledgments
This research was partially supported by a project grant from the Network of Centres of Excellence in Mathematics (Canada). SBB is a Senior Investigator of the Canadian Institutes for Health Research.
References

Leyland AH, Goldstein H, Eds: Multilevel Modelling of Health Statistics.

Levy D, DeStefano AL, Larson MG, O'Donnell CJ, Lifton RP, Gavras H, Cupples LA, Myers RH: Evidence for a gene influencing blood pressure on chromosome 17. Genome scan linkage results for longitudinal blood pressure phenotypes in subjects from the Framingham Heart Study.
Hypertension 2000, 36:477483. PubMed Abstract  Publisher Full Text

Littell R, Milliken G, Stroup W, Wolfinger R: SAS System for mixed models.

Verbeke G, Molenberghs G: Linear Mixed Models in Practice. A SASOriented Approach.

Almasy L, Blangero J: Multipoint quantitative trait linkage analysis in general pedigrees.
Am J Hum Genet 1998, 62:11981211. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Palmer L, Jacobs K, Scurrah K, Xu X, Horvath S, Weiss S: Genomewide linkage analysis in a general population sample using sigma 2A random effects (SSARs) fitted by Gibbs sampling.
Genet Epidemiol 2001, 21(suppl 1):S674S679. PubMed Abstract

Scurrah K, Tobin T, Burton P: Longitudinal variance components models for systolic blood pressure, fitted using Gibbs sampling.
BMC Genetics 2003, 4(suppl 1):S25. PubMed Abstract  BioMed Central Full Text