Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Methodology article

A joint model for nonparametric functional mapping of longitudinal trajectory and time-to-event

Min Lin12 and Rongling Wu1*

Author Affiliations

1 Department of Statistics, University of Florida, Gainesville, FL 32611, USA

2 Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina 27710, USA

For all author emails, please log on.

BMC Bioinformatics 2006, 7:138  doi:10.1186/1471-2105-7-138

The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2105/7/138


Received:13 July 2005
Accepted:15 March 2006
Published:15 March 2006

© 2006 Lin and Wu; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background

The characterization of the relationship between a longitudinal response process and a time-to-event has been a pressing challenge in biostatistical research. This has emerged as an important issue in genetic studies when one attempts to detect the common genes or quantitative trait loci (QTL) that govern both a longitudinal trajectory and developmental event.

Results

We present a joint statistical model for functional mapping of dynamic traits in which the event times and longitudinal traits are taken to depend on a common set of genetic mechanisms. By fitting the Legendre polynomial of orthogonal properties for the time-dependent mean vector, our model does not rely on any curve, which is different from earlier parametric models of functional mapping. This newly developed nonparametric model is demonstrated and validated by an example for a forest tree in which stemwood growth and the time to first flower are jointly modelled.

Conclusion

Our model allows for the detection of specific QTL that govern both longitudinal traits and developmental processes through either pleiotropic effects or close linkage, or both. This model will have great implications for integrating longitudinal and event data to gain better insights into comprehensive biology and biomedicine.

Background

Although there has been a upsurge of interest in jointly modelling longitudinal and event data during the last decade [1-9], no statistical models have been developed to characterize the shared genetic basis for these two types of traits. In biomedicine, the identification of specific genetic variants responsible for an HIV patient's time-dependent CD4 count and for the time to onset of AIDS symptoms can help to design individualized drugs to control this patient's progression to AIDS. Similarly, in studies of prostate cancer, a shared genetic basis between prostate specific antigen, repeatedly measured for patients following treatment for prostate cancer, and the time to disease recurrence can be used to make optimal treatment schedules for patients. In plants, knowledge about whether the genetic loci for reproductive behaviors, such as the time to first flower and the time to form seeds, also govern growth rates and sizes of plants helps to understand the etiology of plant's adaptation to the environment in which they are grown.

The genetic mapping of quantitative trait loci (QTL) that are responsible for longitudinal traits has long been a difficult issue because of the dynamic features of these traits. More recently, part of this difficulty has been solved by integrating the statistical analysis of longitudinal data into a QTL mapping framework, leading to a so-called functional mapping strategy [10-16]. Statistical models for functional mapping were established on the belief that biological processes can be described by mathematical functions. One of the most significant examples for this is the use of S-shaped logistic curves to model growth trajectories. West et al. [17] indicated from fundamental principles of biophysical processes that logistic forms of growth are biologically crucial for the maintenance of optimal metabolic level and, thereby, the best use of available resources for an organism from birth to adulthood. Because of the embedment of fundamental biological principles within the modelling model, functional mapping provides a quantitative framework for testing biologically relevant hypotheses at the interplay between gene actions and development.

The concept of functional mapping can be further extended to jointly mapping a longitudinal variable and a time-to-event by incorporating statistical theories developed to characterize the relationships between longitudinal response and event processes [1-9]. However, original functional mapping models for a dynamic trait reply upon explicit mathematical functions that describe the development of the trait. In practice, there are also many situations in which no appropriate curves can be used to describe a biological process. To model an arbitrary shape of curves, a different statistical model based on nonparametric theory should be formulated. Polynomial analyses that can be specified by varying orders have power to fit curves with arbitrary shapes. As shown by Kirkpatrick and Heckman [18], Legendre polynomials have several favorable properties for curve fitting which include: (1) the functions are orthogonal, (2) it is flexible to fit sparse data, (3) higher orders are estimable for high levels of curve complexity and (4) computation is fast because of good convergence.

The purpose of this article is to develop a joint statistical model for nonparametric functional mapping of longitudinal trajectories based on the Legendre polynomials, integrated with time-to-events. This joint model is constructed within the maximum likelihood context, including simultaneously modelling of the mean vector (based on nonparametric approaches) and covariance matrix (based on parametric approaches). By analyzing stem volume growth data in an example of a forest tree, we will demonstrates the implications of our joint model. Lastly, the advantages of our model in general biomedical and biological research and the areas in which the model can be further refined are discussed.

The Model

The likelihood function

Consider a mapping population of size n for which a number of molecular markers are genotyped, aimed to identify QTL for a longitudinal trait and time-to-event. Every individual of the mapping population is measured for the longitudinal trait at multiple (say T) time points (y) and a time-to-event (z). Variable z can be the time to first flower, the timing of cancer malignance, the time of mortality, or the events that happen at a time. The inference of unknown QTL genotypes for the phenotypic traits based on observed marker information (M) can be made due to co-segregation between the QTL and markers.

Suppose there are two segregating QTL for longitudinal and event traits in the mapping population, each with genotypes 2, 1 and 0. These two QTL are assumed to be linked or associated with and, therefore, can be inferred from, markers. The joint likelihood function of the two types of phenotypic data and marker information at the two underlying QTL is written as

<a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M1">View MathML</a>

where Ω is the unknown vector that defines the QTL positions, time-dependent QTL effects (<a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M2">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M3">View MathML</a>) and covariance matrix (Σ), <a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M4">View MathML</a> is the mixture proportion expressed as the conditional probability of a joint genotype j1j2 for the longitudinal and event QTL given marker genotypes for individual i and <a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M5','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M5">View MathML</a> is the (T + 1)-dimensional multivariate normal distribution function with mean vector (<a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M2">View MathML</a>, <a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M3">View MathML</a>) and covariance matrix Σ.

Conditional probabilities

There are different descriptions of the conditional probability, depending on the type of the mapping population. If the mapping population is an experimental cross initiated with two contrasting parents, such as the F2 or backcross, the conditional probability is described in terms of the recombination fractions between the markers and QTL [12,19]. If the two QTL are bracketed by different pairs of markers, the conditional probability of joint QTL genotypes given the marker intervals can be expressed as the product of the corresponding conditional probabilities for QTL genotypes given a single marker interval. If the two QTL are located at the same marker interval, the conditional probabilities should be derived using the principle of 4-point analysis. For a natural population, the association between the QTL and markers can be described by the coefficients of linkage disequilibria [17]

Modelling the mean vector

The choice of a mean function for a longitudinal trait is based on theory or past experience that suggests a certain mathematical form for the time-dependent mean. However, it would be essential to derive a general approach that can fit any kind of curves. By choosing different orders of orthogonal polynomials, the Legendre function has potential to approximate the functional relationships between trait values and times to any specified degree of precision. The Legendre polynomials are solutions to a very important differential equation, the Legendre equation,

<a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M6','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M6">View MathML</a>

The polynomials may be denoted by Pr(x), called the Legendre polynomial of order r. The polynomials are either even or odd functions of x for even or odd orders r.

The general form of a Legendre polynomial of order k is given by the sum,

<a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M7">View MathML</a>

where K = r/2 or (r - 1)/2 whichever is an integer. This polynomial is defined over the interval [-1, 1]. From Eq. 5, we show the first few polynomials as

<a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M8','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M8">View MathML</a>

In this modelling, independent variable x is expressed as time t, which is adjusted, to rescale the measurement times to the range of the orthogonal function [-1, 1], by

<a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M9','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M9">View MathML</a>

where tmin and tmax are respectively the first and last time points.

Our aim is to model the time-dependent genotypic values for different QTL genotypes j1j2, using the orthogonal Lengedre polynomial with a particular order r. A family of such polynomials is denoted by

<a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M10','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M10">View MathML</a>

and a vector of genotypic values, which is time-independent, denoted by

<a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M11','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M11">View MathML</a>

The time-dependent genotypic values <a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M12">View MathML</a>(t) can be described as a linear combination of <a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M13','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M13">View MathML</a> weighted by the family of the polynomials, i.e.,

<a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M14','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M14">View MathML</a>

Substituting the mean vector of the likelihood (1) by the above expression (3), we will need to estimate time-invariant genotypic values for the longitudinal trait, <a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M13','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M13">View MathML</a>, and the genotypic mean for the event trait, <a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M3">View MathML</a>.

Modelling the covariance matrix

A general form for the covariance matrix among longitudinal trajectories and development event in the likelihood (1) is expressed as

<a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M15">View MathML</a>

where Σy and <a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M16','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M16">View MathML</a> are the covariance matrix and variance for the longitudinal and event traits, respectively, and Σyz = <a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M17','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M17">View MathML</a> is the covariance matrix between these two types of traits. The structures of Σy and Σyz can be empirically modelled on the basis of prior knowledge or results. Several approaches for parametric modelling of the covariance matrix, reviewed by Zimmerman and Nunez-Anton [20], can be utilized.

The most common approach for modelling the covariance structure is based on a variance-correlation specification, in which functions for the responses' variances and correlations are specified. In previous QTL mapping [10-14], the covariance structure for longitudinal traits is modelled by the simplest, most parsimonious and most flexible first-order autoregressive (AR(1)) model in which there are two parameters, stationary variance (<a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M18','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M18">View MathML</a>) and correlation (ρy). Relaxing the stationary variance assumption for growth data, Wu et al. [15] adopted a transform-both-sides (TBS) model to obtain an empirically homogeneous variance. The results from simulation studies suggest that the TBS-based mapping model provides more precise estimates for curve parameters and residual variance-correlation than the untrans-formed model.

The TBS-based model displays the potential to relax the assumption of variance stationarity, but the covariance stationarity issue remains unsolved. Zimmerman and Núñez-Antón [20] proposed a so-called structured antedependence (SAD) model to model the age-specific change of correlation in the analysis of longitudinal traits. The SAD model has been employed in several studies and displays many favorable properties [21,22].

The emergence of a developmental event (z) at time t* can be correlated with the longitudinal trait. For example, larger tumor sizes may be likely to lead to earlier malignance of cancer than smaller tumor sizes. An AIDS patient would die when his/her HIV load accumulates to a particularly high level. In plants, first flowering only appears after some investment of vegetative growth. All such common knowledge suggests that the correlation between the event trait at time t* and longitudinal trait measured at time t (before t*) decays with time difference (t* - t). In fact, a similar pattern of correlation should also hold for t > t* because of the autocorrelation nature. With all this consideration, the correlation between the event and longitudinal traits can be modelled by the power equation, expressed as

<a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M19','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M19">View MathML</a>

where 0 ≤ η ≤ 1. Equation (5) suggests that the event is correlated with the longitudinal trait, to the same extent, before and after its emergence. The event trait should be individual-specific when it is the timing of development, such as the time to first flower. In this case, Equation (5) and, therefore, the covariance matrix (4), expressed as Σi, should be individual-specific. If Σy is modelled by the AR(1) model, one can derive the explicit expressions of the determinant and inverse of Σi specified by (<a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M18','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M18">View MathML</a>, ρy, <a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M16','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M16">View MathML</a>, η, λ).

Computational algorithms

The unknown parameters (Ω) contained with the mixture model (1) include three types, QTL-marker recombination fractions for a pedigree or QTL-marker linkage disequilibria for a natural population reflected in the conditional probabilities <a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M4">View MathML</a>, the curve parameters (<a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M13','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M13">View MathML</a>, <a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M3">View MathML</a>) that model the mean vector, and the parameters (<a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M18','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M18">View MathML</a>, ρy, <a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M16','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M16">View MathML</a>, η, λ) that model the structure of the covariance matrix. We derived the EM algorithm to estimate these parameters. Using the (prior) conditional probability and the likelihood, we define the posterior probability for individual i to bear on a QTL genotype j1j2 as

<a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M20','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M20">View MathML</a>

The posterior probabilities are then used to derive a closed-form maximum likelihood estimates of the QTL locations, expressed as the ratio of recombination fractions, for linkage analysis or QTL-marker haplotype frequencies for linkage disequilibrium analysis [17]. For functional mapping, in which the mean vectors and covariance matrix are modelled by mathematical parameters based on non-linear equations, it is impossible to derive the closed forms for these parameters, the simplex algorithm, widely used in operations research, is found to provide a fast and precise estimation of the curve parameters and the parameters that model the residual covariances [23]. Thus, we implement the simplex algorithm in the maximization process of the EM algorithm.

For linkage analysis based on an experimental cross, <a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M4">View MathML</a>'s are expressed in the recombination fraction between the QTL and two flanking markers. In practical computations, the QTL position parameter can be viewed as a fixed parameter because a putative QTL can be searched at every 1 or 2 cM on a map interval bracketed by two markers throughout the entire genome. The amount of support for a QTL at a particular map position is often displayed graphically through the use of likelihood maps or profiles, which plot the likelihood ratio test statistic as a function of map position of the putative QTL. The peak of the profile corresponds to the position of the QTL over the genome.

For linkage disequilibrium analysis of a natural population, we have derived a closed form for the EM algorithm to estimate QTL-marker haplotype frequencies. From the estimated haplotype frequencies, the allele frequencies of QTL and QTL-marker linkage disequilibria can be estimated. How the markers are associated with the underlying QTL in the population can be tested for the significance of QTL-marker linkage disequilibria.

After the point estimates of parameters are obtained by the EM algorithm, the approximate variance-covariance matrix and the sampling errors of the estimates (<a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M21','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M21">View MathML</a>) can be estimated. The techniques for so doing involve calculation of the incomplete-data information matrix which is the negative second-order derivative of the incomplete-data log-likelihood. The incomplete-data information can be calculated by extracting the information for the missing data from the information for the complete data [24].

Order selection

For a QTL to be detected, we need to determine the optimal order for the Legendre polynomial that fits the data. We propose using the AIC information criterion to select the best model. The AIC value at a particular order, r, is calculated by

AIC = -2 ln L(<a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M21','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M21">View MathML</a>|r) + 2 dimension(Ω|r),     (7)

where (<a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M21','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M21">View MathML</a>|r) is the the MLE of parameters for the Legendre polynomial of order r and dimension (Ω|r) represents the number of independent parameters under order r.

Also, Bayesian Information Criterion (BIC) [25] is used to determine the optimal order of the Legendre function, which is calculated by

BIC = -2 ln L(<a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M21','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M21">View MathML</a>|r) + 2 dimension(Ω|r) ln (nT).     (8)

As compared to AIC, BIC adjusts the effects of sample size and the number of time points measured.

Hypothesis tests

Our model allows for a number of hypothesis tests to examine the genetic control of growth processes [14]. All these tests are helpful to address biological questions related to the genetic control mechanisms of growth. Testing whether specific QTL exist to affect the longitudinal and event processes is a first step toward the understanding of the detailed genetic architecture of complex phenotypes. This can be tested by formulating the following hypotheses,

H0 : <a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M2">View MathML</a> = u and <a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M3">View MathML</a> = v vs. H1: Not all equalities in H0 hold.     (9)

The H0 states that there is no QTL affecting longitudinal and event processes (the reduced model), whereas the H1 proposes that such a QTL does exist (the full model). The test statistic for testing the hypotheses is calculated as the log-likelihood ratio of the reduced to the full model:

LR = -2[ln L0(<a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M22','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M22">View MathML</a>|y, z) - ln L1 (<a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M21','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M21">View MathML</a>|y, z, M)],     (10)

where <a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M22','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M22">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/7/138/mathml/M21','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/7/138/mathml/M21">View MathML</a> denote the MLEs of the unknown parameters under H0 and H1, respectively. Because the LR calculated by equation (9) may not be asymptotically χ2-distributed with eleven degrees of freedom due to violation of regularity conditions, an empirical approach for determining the critical threshold based on permutation tests is used. By repeatedly shuffling the relationships between marker genotypes and phenotypes, a series of the maximum log-likelihood ratios are calculated, from the distribution of which the critical threshold is determined.

After the QTL are detected to be significant for both longitudinal and event traits, we need to test whether the detected QTL are significant separately for each trait. We assume two different genetic settings:

(1) Longitudinal and event traits are under control of the same QTL;

(2) Each process is controlled by different QTL that are linked on the same chromosomal region.

For the first setting, only after it is significant in two separate tests for longitudinal and event traits can the tested QTL be thought to be pleiotropic in affecting both types of traits. For the second setting, we assume that two tested QTL are located in the same interval bracketed by two markers. The comparison of the first and second setting can examine how the detected QTL jointly affect the differentiation in longitudinal and event traits. First, we can test how two genetic mechanisms, pleiotropy or close linkage, contribute to the correlation between these two types of traits. If the two QTL are detected to be significant for both, we then test whether such a correlation is due to pleiotropy or close linkage. Second, when two QTL exist, we can test how they epistatically interact to affect longitudinal trajectories and developmental events. Wu et al. [14] formulated a procedure for testing the epistatic effects on developmental trajectories.

Results

The proposed joint model is used to analyze growth trajectories and flowering behavior in a forest tree. The study material used was derived from the interspecific hybridization of Populus (poplar), P. deltoides and P. eummericana. This hybrid population was planted at a spacing of 4 × 5 m in the complete randomized design in a field trial near Xuzhou City, Jiangsu Province, China. The total stem heights and diameters measured at the end of each of the first 11 growing seasons are used to calculate stem volume indices (y) for QTL analysis. Because the vegetative and reproductive growth processes are generally correlated in plants [26], the ages to first flower (z) was predicted by a regression equation for each of these hybrids. Two genetic linkage maps each based on a different parent were constructed for a subset of hybrids (90) with different types of molecular markers that are segregating in a pattern of pseudo-test backcross [27]. Our analysis here will be based on P. deltoides (D)-specific linkage map.

Although stem height and diameter for each tree follows a logistic curve [10], the stem volume index derived from these two traits cannot be fit by the growth equation mainly because stem volume has not yet reached its asymptotic growth during this measurement period (Fig. 1A). As shown by Figure 1, the variance of the stem volume index increases markedly with age, but the log-transformation of these indices leads to much parallel curves (Fig. 2B), suggesting that the variance stationarity assumption may be met after the transformation.

thumbnailFigure 1. Plots of stem volume index growth vs. ages for each of the 90 genotypes used to construct linkage maps in poplar hybrids (Yin et al. 2002). The relationships between growth and age are displayed for untransformed (A) and log-transformed data (B).

thumbnailFigure 2. The profile of the log-likelihood ratios between the full (there is a QTL) and reduced (there is no QTL) model that combines stem volume index growth trajectories and flower timing across linkage groups in the Populus deltoides parent map. The genomic positions corresponding to the peaks of the curve are the MLEs of the QTL localization. The threshold values for claiming the existence of QTL are given as the horizonal solid lines for the genome-wide level and broken lines for the chromosome-wide level. Blue color corresponds to the unifying model for jointly mapping growth trajectories and flower trait, whereas red color corresponds to a model for mapping growth trajectories only. The positions of markers on the linkage groups (Yin et al. 2002) are indicated at ticks.

We implemented the Legendre function to model the QTL genotypic mean vector of growth trajectories and the TBS-based AR(1) model to approximate the structure of the covariance matrix. The joint model also allows the estimation of the genotypic means and residual variance for the age to first flower, as well as the correlation of this trait with stem wood growth trajectories. Equation (5) provides a general equation for modelling the correlation between the event and longitudinal traits measured at different ages. In this example, it was observed that there were significant correlations between the age to first flower and volume growth at all different ages (-0.29 – -0.82). Thus, for simplicity of computation, we assume that such correlations are consistent across the ages of volume index, denoted as η. Using the adjusted ages, we calculated the coefficients of the Legendre polynomials for the first seven orders (Table 1). These coefficients are used to estimate time-dependent genotypic values. The AIC and BIC values calculated consistently suggested an optimal order of 5 to fit stem volume growth (Table 2).

Table 1. Coefficients of the first five Legendre polynomials for adjusted time points (t*) used in the poplar growth study.

Table 2. The AIC and BIC values used to determine the optimal order for the Lengendre polynomials.

While our joint model was derived to detect two QTL at a time, it was reduced to a one-QTL model because of a limited sample size for sufficient estimates of two-QTL model parameters. For the pseudo-test backcross there are two genotypes, Qq (j = 1) and qq (j = 0), at each QTL. Figure 2 illustrates the profile of the log-likelihood ratio (LR) values for testing the existence of QTL that control either overall growth curves of stem volume indices from age 0 to 10 years or the ages to first flower, or both, across all of the 19 D-specific linkage groups. We performed 100 permutation tests to determine critical threshold values for declaring the existence of QTL. By comparing the peaks of the LR profile with the thresholds, three significant QTL were detected, one on linkage group 2 at the 5% genome-wide testing level and two on linkage groups 5 and 12 at the 5% chromosome-wide testing level (Fig. 2; Table 3). We indicated the positions of these QTL on linkage groups, which correspond to the peaks of the LR profile. If only the stem volume growth is analyzed using traditional functional mapping [10], only the QTL on linkage group 2 is detected, suggesting that the joint model displays better power than a single-trait analysis.

Table 3. The MLEs and their sampling errors (SE, in the parentheses) of the QTL position, time-invariant QTL effects on growth curves (expressed in the Legendre polynomials), QTL effect on the time to first flower, residual variance and residual correlation under the log-transformed model for the interspecific poplar hybrid mapping population.

Each of the three QTL was tested for their pleiotropic effect on both vegetative growth and reproduction by formulating two independent null hypotheses, one being that the QTL does not affect stem growth and the second being that the QTL does not affect flowering age. The rejection of both the null hypotheses implies that a QTL has a pleiotropic effect on growth and reproduction. As indicated by Table 3, all the detected QTL on linkage groups 2, 5 and 12 only trigger a significant effect on stem volume growth, but neither has an effect on both growth and reproduction.

The MLEs of growth parameters for stem volume indices, covariance-structuring parameters and the parameters dealing with reproductive behaviors, as well as their standard errors estimated from the Fisher information matrix, were tabulated in Table 3. It can be seen that the estimates of all the parameters from our joint model provide reasonable precision, using the estimates of growth curves, we draw two different curves each corresponding to a genotype at each of the detected QTL (Fig. 3). Note that growth curves were first drawn from the estimates of the Legendre parameters (the left panel of Fig. 3) and then transformed back to the normal scale (the right panel of Fig. 3). In general. these QTL are switched on to affect the overall stem growth process after age 4–5 years at which strong inter-tree competition sets in the stand due to canopy closure. Figure 3 also displays genotypic differences in the age to first flower at each of the growth QTL. But as tested, only QTL on linkage group 12 has a significant impact on the age to first flower (Table 3). At this QTL, the slower-growing genotype flowers about 0.7 year earlier than the faster-growing genotype. Through this QTL, the fast-growing attribute and the capacity to efficiently occupy growth resources can be transmitted to the next generation.

thumbnailFigure 3. Volume growth curves for two different QTL genotypes for the QTL detected on linkage group 2 by the Legendre polynomial-based model. Left panel: log-transformed curves; Right panel: ante-transformed curves. Growth trajectories for all the individuals studied are indicated in yellow background. The effect of the detected QTL on the time to first flower is indicated.

Discussion

A theoretical framework has been constructed for functional mapping of quantitative trait loci (QTL) underlying longitudinal growth [10-15]. Functional mapping was grounded on biological reality that every organism follows universal growth laws that can be derived from fundamental principles for the allocation of metabolic energy between maintenance of existing tissue and the production of new biomass [17]. In a couple with linkage disequilibrium mapping, functional mapping has been extended to map host QTL for HIV dynamics for a natural human population [16].

Although functional mapping has proven to be both biologically and statistically advantageous in terms of the estimates of the QTL positions and effects, its practical applications may be limited for two reasons. First, a longitudinal variable, such as HIV dynamics, tumor growth or plant vegetative growth, may be related to time-to-events, like time to onset of AIDS symptoms, time to first malignancy or time to first flower, through a common set of QTL [28,29]. Second, not all longitudinal data measured at a series of discrete time points can be fit by a mathematical function with biological means.

In this article, we have proposed a joint model for functional mapping of longitudinal trajectories and time-to-events with the nonparametric context. Several statistical models have been proposed to jointly analyze longitudinal and event processes [1-9]. Different from those traditional models, our joint model has been constructed within the mixture model framework, with each mixture component assigned by biological rationale. We incorporated Legendre polynomials to characterize an arbitrary form of growth curves. In a real example for a forest tree, the model has detected a few QTL that affect growth processes and the age to first flower. The detection of the common genetic basis for vegetative and reproductive growth supports the views that any developmental event is not isolated from the growth process [28,29]. Our model provides a complete genetic analysis of growth courses for various organisms at different organization levels. From a statistical perspective, it increases the power of QTL detection and the precision of parameter estimation because the information about growth and development is jointly utilized. Meanwhile, our model allows for the test of several important hypotheses regarding the genetic control of developmental events occurring from fertilized ovum to reproductive maturity.

Our model is based on nonparametric Legendre orthogonal polynomial approaches for growth and development processes. Orthogonal polynomials (including Legendre) have been extensively used in random regression analyses for longitudinal traits with repeated records [18,30-32]. There are several favorable properties for Legendre polynomials to be utilized in curve fitting, i.e., (1) the functions are orthogonal, (2) it is flexible to fit sparse data, (3) higher orders are estimable for high levels of curve complexity and (4) computation is fast because of good convergence. Nonparametric regression methods for modelling the mean structure of longitudinal data have been based on more commonly used B-spline basis functions [33]. Brown et al. [9] extended the B-spline basis to model multiple longitudinal variables. As compared to the B-spline approach that constructs curves from pieces of lower degree polynomials smoothed at selected pointed (knots), Legendre polynomials are simpler in which only fewer regression coefficients are needed to model the curve. However, polynomials often overemphasize the observations at the extremes and may be problematic for high orders of fit due to oscillations at the extremes of the curve [34]. It is therefore worthwhile implementing more flexible B-spline basis functions into the nonparametric functional mapping model.

In our joint model, we assumed that the time-to-event is multivariate normally distributed together with longitudinal data (see also [35]). An alternative to model the distribution of longitudinal and event data is to take the product of the normal distribution function of longitudinal trajectories and the distribution of the event trait and sensoring indicator given the trajectory function [9,36]. In addition, our model should be extended to consider multiple longitudinal variables based on a framework by Lin et al. [5], multiple time-to-events [8] and structured covariance matrices among unbalanced repeated-measures [37]. In order to unravel the genetic architecture of complex phenotypes that are characterized by a network of biological processes, such extensions will be essential. With appropriate improvements, our joint model will have great power to unlock the genetic secrets hidden in various complicated and biologically realistic life processes.

Conclusion

We have developed a joint statistical model that can detect specific QTL governing both longitudinal traits and developmental processes through either pleiotropic effects or close linkage, or both. This model was integrated by nonparametric approaches that do not rely on mathematical equations to model growth curves. The model will have great implications for integrating longitudinal and event data to gain better insights into comprehensive biology and biomedicine.

Authors' contributions

ML derived the models, programmed the method and performed data analyses. RW conceived the idea and drafted the manuscript.

thumbnailFigure 4. Volume growth curves for two different QTL genotypes for the QTL detected on linkage group 5 by the Legendre polynomial-based model. See Figure 3 for all the explanations.

thumbnailFigure 5. Volume growth curves for two different QTL genotypes for the QTL detected on linkage group 12 by the Legendre polynomial-based model. See Figure 3 for all the explanations.

Acknowledgements

We thank the two anonymous referees for their constructive comments on this manuscript. The preparation of this manuscript has been partially supported by NSF grant (0540745) and NIH grant (R01 NS041670) to R. W. The publication of this manuscript is approved as journal series No.-10579 by the Florida Agricultural Experimental Station.

References

  1. Tsiatis AA, DeGruttola V, Wulfsohn MS: Modeling the relationship of survival to longitudinal data measured with error: Applications to survival and CD4 counts in patients with AIDS.

    Journal of the American Statistical Association 1995, 90:27-37. Publisher Full Text OpenURL

  2. Wulfsohn MS, Tsiatis AA: A joint model for survival and longitudinal data measured with error.

    Biometrics 1997, 53:330-339. PubMed Abstract OpenURL

  3. Henderson R, Diggle P, Dobson A: Joint modeling of longitudinal measurements and event time data.

    Biostatistics 2000, 4:465-480. Publisher Full Text OpenURL

  4. Song X, Davidian M, Tsiatis AA: A semiparametric likelihood approach to joint modeling of longitudinal and time-to-event data.

    Biometrics 2002, 58:742-753. PubMed Abstract | Publisher Full Text OpenURL

  5. Lin HQ, McCulloch CE, Mayne ST: Maximum likelihood estimation in the joint analysis of time-to-event and multiple longitudinal variables.

    Statistics in Medicine 2002, 21:2369-2382. PubMed Abstract | Publisher Full Text OpenURL

  6. Lin HQ, Turnbull BW, McCulloch CE, Slate EH: Latent class models for joint analysis of longitudinal biomarker and event process data.

    Journal of the American Statistics Association 2002, 457:53-65. Publisher Full Text OpenURL

  7. Tsiatis AA, Davidian M: Joint modeling of longitudinal and time-to-event data: An overview.

    Statistica Sinica 2004, 14:809-834. OpenURL

  8. Chi YY, Ibrahim JG: Joint models for multivariate longitudinal and multivariate survival data.

    Biometrics 2005, 61:000-000. OpenURL

  9. Brown EB, Ibrahim JG, DeGmttola V: A flexible B-spline model for multiple longitudinal biomarkers and survival.

    Biometrics 2005, 61:64-73. PubMed Abstract | Publisher Full Text OpenURL

  10. Ma CX, Casella G, Wu RL: Functional mapping of quantitative trait loci underlying the character process: A theoretical framework.

    Genetics 2002, 161:1751-1762. PubMed Abstract | Publisher Full Text OpenURL

  11. Wu RL, Ma CX, Chang M, Littell RC, Wu SS, Huang M, Wang M, Casella G: A logistic mixture model for characterizing genetic determinants causing differentiation in growth trajectories.

    Genetical Research 2002, 19:235-245. Publisher Full Text OpenURL

  12. Wu RL, Ma CX, Zhao W, Casella G: Functional mapping of quantitative trait loci underlying growth rates: A parametric model.

    Physiological Genomics 2003, 14:241-249. PubMed Abstract | Publisher Full Text OpenURL

  13. Wu RL, Ma CX, Yang MCK, Chang M, Santra U, Wu SS, Huang M, Wang M, Casella G: Quantitative trait loci for growth in Populus.

    Genetical Research 2003, 81:51-64. PubMed Abstract | Publisher Full Text OpenURL

  14. Wu RL, Ma CX, Lin M, Casella G: A general framework for analyzing the genetic architecture of developmental characteristics.

    Genetics 2004, 166:1541-1551. PubMed Abstract | Publisher Full Text OpenURL

  15. Wu RL, Ma CX, Lin M, Wang ZH, Casella G: Functional mapping of growth QTL using a transform-both-sides logistic model.

    Biometrics 2004, 60:729-738. PubMed Abstract | Publisher Full Text OpenURL

  16. Wang ZH, Wu RL: A statistical model for high-resolution mapping of quantitative trait loci determining human HIV-1 dynamics.

    Statistics in Medicine 2004, 23:3033-3051. PubMed Abstract | Publisher Full Text OpenURL

  17. West GB, Brown JH, Enquist BJ: A general model for ontogenetic growth.

    Nature 2001, 413:628-631. PubMed Abstract | Publisher Full Text OpenURL

  18. Kirkpatrick M, Heckman N: A quantitative genetic model for growth, shape, reaction norms, and other infinite-dimensional characters.

    Journal of Mathematical Biology 1989, 27:429-450. PubMed Abstract | Publisher Full Text OpenURL

  19. Lander ES, Botstein D: Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps.

    Genetics 1989, 121:185-199. PubMed Abstract | Publisher Full Text OpenURL

  20. Zimmerman DL, Núñez-Antón V: Parametric modeling of growth curve data: An overview (with discussions).

    Test 2001, 10:1-73. OpenURL

  21. Jaffrézic F, Thompson R, Hill WG: Structured antedependence models for genetic analysis of repeated measures on multiple quantitative traits.

    Genetical Research 2003, 82:55-65. PubMed Abstract | Publisher Full Text OpenURL

  22. Zhao W, Chen YQ, Casella G, Cheverud JM, Wu RL: A nonstationary model for functional mapping of complex traits.

    Bioinformatics 2005, 21:2469-2477. PubMed Abstract | Publisher Full Text OpenURL

  23. Zhao W, Wu RL, Ma CX, Casella G: A fast algorithm for functional mapping of complex traits.

    Genetics 2004, 167:2133-2137. PubMed Abstract | Publisher Full Text OpenURL

  24. Louis TA: Finding the observed information matrix when using the EM algorithm.

    Journal of the Royal Statistics Society Series B 1982, 44:226-233. OpenURL

  25. Schwarz G: Estimating the dimension of a model.

    Annals of Statistics 1978, 6:461-464. OpenURL

  26. Kozlowski TT, Pallardy SG: Acclimation and adaptive responses of woody plants to environmental stresses.

    Botanical Review 2002, 68:270-334. OpenURL

  27. Yin TM, Zhang XY, Huang MR, Wang MX, Zhuge Q, Tu SM, Zhu LH, Wu RL: The molecular linkage maps of the Populus genome.

    Genome 2002, 45:541-555. PubMed Abstract | Publisher Full Text OpenURL

  28. Ambros V: Control of developmental timing in Caenorhabditis elegans.

    Current Opinion in Genetics and Development 2000, 10:428-33. Publisher Full Text OpenURL

  29. Rougvie AE: Control of developmental timing in animals.

    Nature Reviews Genetics 2001, 2:690-701. PubMed Abstract | Publisher Full Text OpenURL

  30. Schaeffer LR: Application of random regression models in animal breeding.

    Livestock Production Science 2004, 86:35-45. Publisher Full Text OpenURL

  31. Meyer K: Estimates of genetic covariance functions for growth of Angus cattle.

    Journal of Animal Breeding and Genetics 2005, 122:73-85. Publisher Full Text OpenURL

  32. Meyer K: Random regression analyses using B-splines to model growth of Australian Angus cattle.

    Genetics Selection Evolution 2005, 37:473-500. Publisher Full Text OpenURL

  33. Rice JA, Wu CO: Nonparametric mixed effects models for unequally sampled noisy curves.

    Biometrics 2001, 57:253-259. PubMed Abstract | Publisher Full Text OpenURL

  34. de Boor C: A Practical Guide to Splines. 2nd edition. Springer-Verlag; 2001. OpenURL

  35. Degmttola V, Tu XM: Modeling progression of CD4-lymphocyte count and its relationship to survival-time.

    Biometrics 1994, 50:1003-1014. PubMed Abstract OpenURL

  36. Jacqmin-Gadda H, Thiebaut R, Chene G, Commenges D: Analysis of left-censored longitudinal data with application to viral load in HIV infection.

    Biostatistics 2000, 1:355-368. PubMed Abstract | Publisher Full Text OpenURL

  37. Jennrich RI, Schluchter MD: Unbalanced repeated-measures models with structured covariance matrices.

    Biometrics 1986, 42:805-820. PubMed Abstract OpenURL