Email updates

Keep up to date with the latest news and content from BMC Genetics and BioMed Central.

Open Access Research article

Linkage analysis of longitudinal data and design consideration

Heping Zhang1* and Xiaoyun Zhong2

Author Affiliations

1 Department of Epidemiology and Public Health, Yale University School of Medicine, 60 College Street, New Haven, CT 06520-8034, USA

2 Department of Medicine, University of Massachusetts Medical School, 55 Lake Avenue North, Worcester, MA 01655, USA

For all author emails, please log on.

BMC Genetics 2006, 7:37  doi:10.1186/1471-2156-7-37

The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2156/7/37


Received:25 April 2006
Accepted:12 June 2006
Published:12 June 2006

© 2006 Zhang and Zhong; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background

Statistical methods have been proposed recently to analyze longitudinal data in genetic studies. So far, little attention has been paid to examine the relationship among key factors in genetic longitudinal studies including power, the number of families or sibships, and the number of repeated measures per individual subjects.

Results

We proposed a variance component model that extends classic variance component models for a single quantitative trait to mapping longitudinal traits. Our model includes covariate effects and allows genetic effects to vary over time. Using our proposed model, we examined the power, pedigree structures, and sample size through simulation experiments.

Conclusion

Our simulation results provide useful insights into the study design for genetic, longitudinal studies. For example, collecting a small number of large sibships is much more powerful than collecting a large number of small sibships or increasing the number of repeated measures, when the total number of measurements is comparable.

Background

Longitudinal study design has been routinely used to investigate the etiology and epidemiology of complex diseases, and statistical methods for analyzing longitudinal data are well established [1]. However, there are limited applications of longitudinal data in genetic studies.

Province and Rao [2] used path analysis for assessing familial aggregation in the presence of temporal trends, although their analysis did not include genetic marker information. Longitudinal studies have also been used in a few occasions for twin and adoption studies (e.g., [3-6]). However, the main purpose of those studies was to assess the heritability of a trait, instead of mapping candidate loci.

Using an ad hoc approach, Levy and colleagues [7] conducted a linkage scan of the Framingham Heart Study. They regress the phenotype against covariates as in a standard mixed effects model, and then treat the residuals corresponding to individual measurements as a quantitative trait in standard linkage analysis software such as SOLAR [8]. More recently, in the Genetic Analysis Workshop 13, some participants examined two-step models and some proposed joint models [9]. The first step in a two-step model is similar to that of Levy et al. [7] by fitting an "ordinary" longitudinal model without consideration for genetic markers or family structures. Then, in the second step, linkage analysis is performed on one or more statistics derived from the first step. While such two-step methods are practical and simple, they are not ideal. For example, even if the covariates have additive effects to the genetic effects, potential useful information can be lost in deriving the residuals or some summary statistics. Besides, the selection among different statistics (e.g., residuals and averages) to be used in the second stage increases the number of tests to be performed, which raises the multiple comparison issue. Also importantly, the lack of a well-defined statistical model directly associating the original phenotype to the inheritance of the markers makes it infeasible to conduct formal statistical inference. In fact, the authors in the Genetic Analysis Workshop 13 [9] clearly pointed out that a joint approach to simultaneously estimating genetic and longitudinal model parameters is appealing, because estimates of genetic and longitudinal parameters will be mutually adjusted for one another. Thus, in this report, we consider a joint model that is related to some of the models described in [9]. Our main objective is to use our model to examine the relationship among key factors in genetic longitudinal studies including power, the number of families or sibships, and the number of repeated measures per individual subjects.

There is a growing effort to develop mixed effects models that separate the genetic effect from environmental effects [10] and that incorporate temporal information [11]. However, those models do not have simple structures to accommodate genetic and temporal interactions, or to enable us to assess the longitudinal study design in linkage analysis. This raises the computational concern and may limit the analyses that can be performed as pointed in [9]. Hence, our idea is to use a realistic yet simple variance component model that can be used to analyze general pedigree data such as the Framingham Heart Study and that allows us to consider age specific genetic effects and related study design issues. We choose a variance component model because this type of models is well established for linkage analysis of quantitative traits (e.g., [8,12,13]).

Results

In this section, we report our simulation results to assess the Type I error rate based on the asymptotic theory, and the power of our method in detecting linkage. We are particularly interested in the effectiveness of repeated measures in improving the power. For example, how do we determine the most cost-effective number of repeated measures? The computation was performed by a statistical software R using our own program, which are available upon request. We should note that our model and program have been used to analyze general pedigree data such as the Framingham Heart Study (to be reported in a future report), although our simulation below is focused on sibships to reduce computational burden. Nuclear families were simulated, and fully informative markers with four equally frequent alleles were generated. All parental alleles were distinguished. For the nuclear families, phenotypes were simulated only for the siblings. In all the simulations, each sib in every nuclear family has 5 measurements taken at different times. The measurement times were simulated simply as (1, 2, 3, 4, 5). A covariate was simulated from a uniform distribution between 0 and 1. For clarity, we used f(X, t) = to generate the data, where β = (β0, β1, β2)' = (1, 1, 1)' and β0, β1 and β2 are parameters for the intercept, the time and the simulated covariate in mean structure. As in related studies [13], we did not consider dominant effects in the simulation studies and set (<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M1">View MathML</a>, <a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M2">View MathML</a>) = (0, 0).

Type I error rates

To evaluate the type 1 error rates of the proposed tests, we considered two different null models. The first type of null model assumes that the genetic linkage effect due to the testing QTL and the polygenic effect are both zero, that is, (<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M3">View MathML</a>, <a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M4">View MathML</a>) = (0, 0). The second type of null model assumes there is no genetic linkage effect due to the testing QTL but there is some polygenic effect and (<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M3">View MathML</a>, <a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M4">View MathML</a>) = (0, 1). We also simulated a measurement error from a normal distribution with the variance σ2 equal to 7 and the autocorrelation between measurements at two time points t and u for a sib equals exp(-0.5|t - u|). We considered in the analysis two choices of s(t): linear [s(t) = s0 + s1t] and quadratic [s(t) = s0 + s1t + s2t2]. We simulated 5,000 replications of 100 sib pairs.

Likelihood ratio test is used to test the null hypothesis that the genetic variance due to the testing QTL equals zero (no linkage).

We use two times the natural logarithm of the likelihood ratio as the test statistic. Its asymptotic distribution appears to be a mixture of χ2 distributions [16], but the degrees of freedom depend on s(t).

When s(t) is constant, the model is equivalent to the traditional variance-component model since we can consider only one independent parameter, i.e., either s0 or <a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M3">View MathML</a>. In this case, the test statistic asymptotically follows <a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M5','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M5">View MathML</a>. For a linear s(t), the asymptotic distribution of the test statistic appears to be <a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M6','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M6">View MathML</a>. For a quadratic s(t), the asymptotic distribution of the test statistic appears to be <a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M7">View MathML</a>. Because we do not have theoretical proofs for the asymptotic distributions of the test statistic, we derived critical values empirically through simulations.

In practice, we do not know the form of s(t). However, we can use the backward selection as in regression analysis by beginning with the quadratic polynomial and testing whether the coefficients are zero or not. This strategy can serve as the guide in determining the final form of s(t).

Table 1 presents the empirical type I error rates based on 5,000 simulated replications under two null models. The rejection rates in the table were obtained by computing the frequencies at which the null hypotheses were rejected at the critical values from the stated asymptotic distributions. Given that we used only 100 sib pairs, the empirical type I error rates are numerically close to the nominal significance levels.

Table 1. Type 1 error rate comparisons based on 5000 simulations of 100 sib pairs under two different types of null models. Null model A is the model simulated under no heritability due to the testing QTL and no polygenic heritability. Null model B is the model without heritability due to the testing QTL but with polygenic heritability h2 and

<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M4">View MathML</a>

= 1. The other underlying parameters are (β0, β1, β2) = (1, 1, 1), and (σ2, α) = (7, 0.5). The assumed s(t) is labeled as "i" for s0 + s1t, and "q" for s0 + s1t + s2t2.

Power comparisons

To compare the power increment from larger sibships, we considered the scenarios of collecting 200 sib pairs, 400 sib pairs and 200 nuclear families with 4 siblings each so that we can assess the corresponding effects of the number of nuclear families, the size of the nuclear families, and the number of repeated measures on power. We simulated data from the following three forms of s(t): (a) s(t) = 1 + 0.1t; (b) s(t) = 1; (c) <a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M8','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M8">View MathML</a>. We also generated measurement errors from a multivariate normal distribution with the variance σ2 and the within-subject autocorrelation exp(-α|t - u|) between measurements at two time points t and u. To evaluate the power, we conducted a number of experiments using various genetic models: (a) (<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M3">View MathML</a>, <a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M4">View MathML</a>, σ2, α)' = (2, 1, 7, 0.5)'; (b) (<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M3">View MathML</a>, <a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M4">View MathML</a>, σ2, α)' = (1, 1, 8, 0.5)'; (c) (<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M3">View MathML</a>, <a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M4">View MathML</a>, σ2, α)' = (0.5, 1, 8.5, 0.5)'. Note that these four parameters determine the extent of the overall genetic heritability as well as the heritability due to a specific locus under consideration.

When presenting our power assessment, we make use of a generalized heritability measure for longitudinal trait proposed by de Andrade et al. [11]. To incorporate the serial variance components, we express the polygenic and major gene heritabilities in our model as

<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M9','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M9">View MathML</a>

Table 2 displays the polygenic and major gene heritabilities used in our simulation models when different numbers of repeated measurements are used.

Table 2. The polygenic and major gene heritabilities (h2 and

<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M10','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M10">View MathML</a>

) used in our simulation models

Regardless of the true form of s(t), in our estimation we assumed s(t) to be one of the following three forms: s(t) = s0, s(t)= s0 + s1t, and s(t) = s0 + s1t + s2t2 where s0 is nonnegative, and it may need to be estimated together with s1 and/or s2, depending on the choice. As stated above, one of the true s(t)'s is the logit function. This is because we want to know what happens in linkage detection when s(t) is misspecified.

To understand the gain of power as a result of more repeated measures, we examined the power using all or some of the 5 measurements for each sib. We also compared the power from our models with the power of using traditional variance component (VC) method for a single measurement. The single measure can be a measurement at a particular time point or the average of the five measurements for each sib.

Tables 3, 4, and 5 display the power in the experiments as specified above. To appreciate the incremental gain of power as the number of repeated measures increases, we compared the power estimates when we used all or some of the 5 repeated measurements. As expected, the power increases as the number of repeated measures and/or the number of families increase. However, the increment of power is not uniform, and depends on the significance level. For example, ascertaining 200 sib pairs with four repeated measures tends to yield better power than collecting 400 sib pairs with two repeated measures when there is a gene-time interaction, and vice versa when there is no gene-time interaction. The information from these tables underscores the importance to conduct the power calculation under the specific designs and significance level in order to choose the most cost effective designs.

Table 3. The power comparisons based on 500 replicates. The underlying parameters are (β0, β1, β2) = (1, 1, 1), and (

<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M3">View MathML</a>

,

<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M4">View MathML</a>

, σ2, α) = (2, 1, 7, 0.5). The assumed s(t) is labeled as "c" for constant, "l" for s0 + s1t, and "q" for s0 + s1t + s2t2.

Table 4. The power comparisons based on 500 replicates. The underlying parameters are (β0, β1, β2) = (1, 1, 1), and (

<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M3">View MathML</a>

,

<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M4">View MathML</a>

, σ2, α) = (1, 1, 8, 0.5). The assumed s(t) is labeled as "c" for constant, "l" for s0 + s1t, and "q" for s0 + s1t + s2t2.

Table 5. The power based on 500 replicates of 200 4-sib families. For comparison purpose with the other tables, we consider two repeated measures only. The underlying parameters are (β0, β1, β2) = (1, 1, 1) with various settings of (

<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M3">View MathML</a>

,

<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M4">View MathML</a>

, σ2, α). The assumed s(t) is labeled as "c" for constant, "l" for s0 + s1t, and "q" for s0 + s1t + s2t2.

Tables 3 and 4 reveal serious loss of power of ignoring a gene-time interaction. For example, in Table 3 when the underlying s(t) = 1 + 0.1t, with 5 repeated measures, the power estimates by ignoring s(t) were 0.77, 0.56, 0.26, and 0.09, respectively, at significance levels 0.05, 0.01, 0.001, and 0.0001. In contrast, the respective power estimates were increased to 0.90, 0.78, 0.45, and 0.24 when we estimated s(t) from s0 + s1t. We should also note here that the fold of increase is more dramatic for a more stringent significance level. On the other hand, is there a loss of the power if we consider s(t) when there is no time-dependent genetic effort? Or, broadly, what happens to the power if the time-dependent effect is misspecified? Tables 3, 4, and 5 address these questions. As expected, the power is at its peak when the underlying time trend is correctly specified. However, even with a misspecified trend, the test based on our model is more powerful than the one using a single measure, regardless of whether it was from a particular age or the average of the same number of repeated measures. We should note that, from our experiment, the use of the average of repeated measures yields more power than the use of a single measure at a given time point. In other words, without any consideration for the cost and effectiveness, we gain power from repeated measures even with a simple approach.

Finally, Table 3, 4, and 5 reveal the substantial benefit of power as a result of ascertaining large pedigrees. Table 5 displays the power of using 200 4-siblings. The power estimates using 400 sib pairs is available in Tables 3 and 4. Clearly, whenever feasible, collecting large sibships are more effective than collecting more sibships or more repeats.

Discussion

In this work, we proposed a variance component model to map candidate genes when the quantitative trait is measured repeatedly. A notable feature of our model is to accommodate a potential gene-time interaction. In the existing literature, longitudinal information on the trait is sometimes re-processed into a single trait and then the standard variance component model is applied [7]. Agreeing with other authors, we believe it is useful to have a unified model so that formal statistical inference can be performed. This benefit is evident from the simulation reported here.

We should note that the power is low with the sample sizes that we considered when the significance level is set at 0.0001. Since our purpose is to compare the power in various design settings, the absolute level of power is not critical. This is purely to reduce the computational time for our simulation. In practice, if an 80% power is desirable, for example, both the sample size and simulation replication should be increased. Despite the fact that the longitudinal study design are very popular in epidemiological and medical research, its use is still limited in linkage analysis [11]. Here, we only discuss a basic model to explore the potential of using longitudinal data and to investigate cost effective designs. Our model is related to, but has a simpler structure than that of de Andrade et al. [11]. We focus on the time at which the data are collected, but different study subjects may have data available at different time points from others. We also allow a potentially general temporal trend to interact with the genetic effect. In contrast, de Andrade et al. [11] proposed a model that assumed an individual genetic effect at every time point, which requires a uniform time schedule for all study subjects. This is a reasonable assumption for some studies including the Framingham Heart Study, but it may become restrictive to other studies.

Clearly, many important research issues warrant further investigation. For example, we need to consider gene-gene interactions, gene-environment interactions, and more general forms of gene-time interaction and fixed effects. Other classic issues including sample selection, ascertainment bias, multiple genes, and imprinting also require further investigations.

Conclusion

We conducted a number of simulation studies to explore the increment of power when the number of sibships is increased, when the number of repeated measures is increased, and when the size of families is increased. While we expect that these factors enhance the power, how they do so is rather intriguing. Our results can provide useful guidance for designing a genetic, longitudinal study to balance the cost, feasibility, and power. For example, collecting a small number of families with a large sibship is more effective than collecting a comparatively large number of families with a small sibship. Collecting fewer families with more repeated measures may or may not lead to more power than collecting more families with fewer repeated measures, depending on the underlying genetic models. In general, however, the relationship between the power and design is subtle, and depends on the significance level and obviously the size of genetic effects. It is wise to conduct appropriate power simulations before a genetic, longitudinal study is carried out so that the cost, the feasibility, and power can be balanced. Software can be requested from the authors for such simulations.

Although our simulations were based on nuclear families, our model can handle general pedigrees as we have used it to analyze data from the Framingham Heart Study for which the pedigree size was, on average, 5 and ranged from 2 to 29.

Methods

The model and methods

Let y denote a quantitative trait. For convenience, we first consider one pedigree. By assuming-independence between pedigrees, it is straightforward to multiply the likelihood from multiple pedigrees.

Let i refer to the ith member in a pedigree and tij be the time when the quantitative trait is measured at the jth occasion, j = 1,...,Ti and i = 1,...,n. Consider the model:

yi(tij) = f(Xi, tij) + s(tij)γi1 + γi2 + ei(tij),     (1)

where f(Xi, tij) is a function of the fixed effect Xi and time tij, s(tij) a simple parametric function to accommodate time variant genetic effects, γi1 the random effect for a major gene, γi2 the random effect for unspecified polygenic effects over the genome, and ei(tij) the measurement error, j = 1,...,Ti and i = 1,...,n. We assume that γi1, γi2, and ei are independent, although ei(tij), j = 1,...,Ti, has a within-subject correlation structure that needs to specified on a case-by-case basis. It follows:

cov(yi(t), yl(u)) = s(t)s(u)cov(γi1, γl1) + cov(γi2, γl2) + δ(i = l)σ(t, u),

where σ(t, u) is the covariance function for e(t) and e(u) and δ(i = l) is the identity indicator. In addition, the covariances of γi1 and γi2 can be partitioned into additive and dominant variances as follows:

<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M12">View MathML</a>

and

<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M13','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M13">View MathML</a>

where k1,il and k2,il represent the k coefficients of [14] for the probability of members i and l sharing 1 and 2 alleles, respectively, identity by decent (IBD) at the locus of interest, φ and τ are respectively the expected kinship coefficient and the expected probability of sharing 2 alleles IBD over the residual components of the genome, <a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M3">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M1">View MathML</a> are respectively the additive and dominant genetic variances at the locus of interest, and <a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M4">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M2">View MathML</a> are respectively the total additive and dominant genetic variances over the residual components of the genome.

With s(tij) = 1, without f(Xi, tij), and without repeated measures, model (1) reduces to the standard variance component model for quantitative traits. Thus, model (1) is an extension of the standard variance component model to accommodate the repeated measures with a structured gene-time interaction. The structured gene-time interaction distinguishes model (1) from the existing models (e.g. [11]). Although γi1 does not depend on time, the manifest of genetic effects over time is accomplished through s(t). For simplicity, model (1) does not consider time-varying polygenic effects because there is no interaction term between γi2 and time.

Parameter estimation and hypothesis testing

If we arrange the phenotype in model (1) as

y = (y1(t11),...,<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M14','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M14">View MathML</a>,...,yi(ti1),...,<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M15">View MathML</a>,...,yn(tn1),...,<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M16','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M16">View MathML</a>)',     (2)

then its covariance matrix is

<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M17','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M17">View MathML</a>

Where s(ti) = (s(ti1),..., s(tiTi)', Π = (πil)n × n, K = (k2,il)n × n, Φ = (φil)n × n, Ω = (τil)n × n, <a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M19','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M19">View MathML</a> is a vector of Ti 1's, and E is a block diagonal matrix,

<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M20','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M20">View MathML</a>

in which

<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M21','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M21">View MathML</a>

For example, if σ(t, u) = σ2e-α|t - u|, we have

<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M22','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M22">View MathML</a>

In this work, we assume that γi1, γi2, and ei have normal distributions with mean 0. If the normality is not assumed, a generalized estimating equation approach can be adopted. However, we will not explore this approach here. For clarity, we consider a specific version of model (1). Namely, let f(Xi, tij) = β0 + tijβ1 + Xi(tij)β2, where β2 is a p-vector of parameters. In addition, assume that s(t) is a first-order polynomial function, s(t) = s0 + s1t.

Let

β = (β0, β1, β2)'     (4)

be the vector of fixed effect parameters, and

<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M23','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M23">View MathML</a>

be the vector of the covariance parameters. We estimate these parameters through the restricted maximum likelihood (REML) approach introduced by Patterson and Thompson [15] which takes into account the loss in degrees of freedom resulting from estimating fixed effects and avoids the bias in the estimation of covariance parameters.

Note that y has a multivariate normal distribution with mean and covariance Σ, where

<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M24','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M24">View MathML</a>

Now, let us consider M independent pedigrees. Let

<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M25','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M25">View MathML</a>

where y(m), A(m) and Σ(m) are of the forms (2), (6), and (3) respectively for the mth pedigree, m = 1,...,M.

The REML log likelihood is given by

<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M26','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M26">View MathML</a>

Maximizing L(β, θ) with respect to β gives

<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M27','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M27">View MathML</a> = (A-1A)-1 A-1Y.

Plugging <a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M27','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M27">View MathML</a> into the log likelihood, we have

<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M28','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M28">View MathML</a>

where P = Σ-1(I - A(A-1A)-1A-1). The REML estimator for θ is obtained by maximizing the log-likelihood l(θ). Substituting the estimator for θ into <a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M27','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M27">View MathML</a> gives the REML estimator for β.

Based on the theory on matrix derivatives, we have <a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M29','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M29">View MathML</a>, <a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M30','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M30">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M31','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M31">View MathML</a>. Therefore, the first-order partial derivative of the log likelihood l(θ) with respect to θ gives

<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M32','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M32">View MathML</a>

and the second-order partial derivative of the log likelihood l(θ) with respect to θ gives

<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M33','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M33">View MathML</a>

Denote the matrix of the negative second partial derivatives of l(θ) as

<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M34','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M34">View MathML</a>

A Newton-Raphson algorithm yields

<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M35','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M35">View MathML</a>

Iterate until changes in successive estimates of all parameters are sufficiently small. Let <a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M36','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M36">View MathML</a> be the converged estimate of θ.

If (β*, θ*) is the vector of true parameter values, based on classical statistical theory, (<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M27','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M27">View MathML</a> - β*, <a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M36','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M36">View MathML</a> - θ*) follows asymptotically a multivariate normal distribution with mean 0. And the asymptotical covariance matrix can be estimated by I-1(<a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M27','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M27">View MathML</a>, <a onClick="popup('http://www.biomedcentral.com/1471-2156/7/37/mathml/M36','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/7/37/mathml/M36">View MathML</a>), where I(β, θ) is the information matrix.

Linkage is tested by a likelihood ratio test by comparing the likelihood under the alternative hypothesis in which the genetic variance component due to the testing QTL is estimated with that under the null hypothesis of the genetic variance due to the testing QTL being equal to zero (no linkage). Twice the natural logarithm of the likelihood ratio of these two models may have a complex asymptotic distribution of a mixture of χ2 distributions [16] and what kind of asymptotic distribution depends on how s(t) is defined.

Authors' contributions

HZ contributed to the conception and design of the study, analysis and interpretation of data, and XZ contributed to the design of the study, wrote the programs, and performed the simulation analysis. Both authors have been involved in writing the manuscript and approved this final version.

Acknowledgements

This research is supported in part by grants DA017713 and DA016750 from the National Institute on Drug-Abuse.

References

  1. Diggle P, Liang KY, Zeger SL: Analysis of longitudinal data. Oxford; New York: Oxford University Press; 2002. OpenURL

  2. Province MA, Rao DC: Familial aggregation in the presence of temporal trends.

    Statistics in Medicine 1988, 7:185-198. PubMed Abstract OpenURL

  3. Eaves LJ, Long J, Heath AC: A theory of developmental change in quantitative phenotypes applied to cognitive development.

    Behav Genet 1986, 16:143-161. PubMed Abstract | Publisher Full Text OpenURL

  4. Phillips K, Fulker DW: Quantitative genetic analysis of longitudinal trends in adoption designs with application to IQ in the Colorado Adoption Project.

    Behav Genet 1989, 19:621-658. PubMed Abstract | Publisher Full Text OpenURL

  5. Williams CJ, Viken R, Rose RJ: Likelihood-based analyses of longitudinal twin and family data: experiences with pedigree-based approaches.

    Behav Genet 1992, 22:215-223. PubMed Abstract | Publisher Full Text OpenURL

  6. Huggins RM, Hoang NH, Loesch DZ: Analysis of longitudinal data from twins.

    Genetic Epidemiology 2000, 19:345-353. PubMed Abstract | Publisher Full Text OpenURL

  7. Levy D, DeStefano AL, Larson MG, O'Donnell CJ, Lifton RP, Gavras H, Cupples LA, Myers RH: Evidence for a gene influencing blood pressure on chromosome 17. Genome scan linkage results for longitudinal blood pressure phenotypes in subjects from the Framingham heart study.

    Hypertension 2000, 36:477-483. PubMed Abstract | Publisher Full Text OpenURL

  8. Blangero J, Almasy L: Multipoint oligogenic linkage analysis of quantitative traits.

    Genetic Epidemiology 1997, 14:959-964. PubMed Abstract | Publisher Full Text OpenURL

  9. Gauderman WJ, Macgregor S, Briollais L, Scurrah K, Tobin M, Park T, Wang D, Rao S, John S, Bull S: Longitudinal data analysis in pedigree studies.

    Genetic Epidemiology 2003, (Suppl 25):18-28. Publisher Full Text OpenURL

  10. Jaffrezic F, White IM, Thompson R: Use of the score test as a goodness-of-fit measure of the covariance structure in genetic analysis of longitudinal data.

    Genetics Selection Evolution 2003, 35:185-198. Publisher Full Text OpenURL

  11. de Andrade M, Gueguen R, Visvikis S, Sass C, Siest G, Amos CI: Extension of variance components approach to incorporate temporal trends and longitudinal pedigree data analysis.

    Genetic Epidemiology 2002, 22:221-232. PubMed Abstract | Publisher Full Text OpenURL

  12. Amos CI: Robust variance components approach for assessing genetic linkage in pedigree.

    American Journal of Human Genetics 1994, 54:535-543. PubMed Abstract OpenURL

  13. Almasy L, Blangero J: Multipoint quantitative-trait linkage analysis in general pedigrees.

    American Journal of Human Genetics 1998, 62:1198-1211. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  14. Cotterman CW: A calculus for statistico-genetics. PhD thesis. Ohio State University, Columbus; 1940. OpenURL

  15. Patterson HD, Thompson R: Recovery of inter-block information when block sizes are unequal.

    Biometrika 1971, 58:545-554. Publisher Full Text OpenURL

  16. Self SG, Liang KY: Large sample properties of maximum likelihood estimator and the likelihood ratio test on the boundary of the parameter space.

    Journal of the American Statistical Association 1987, 82:605-610. Publisher Full Text OpenURL