Abstract
Background
Many dichotomous traits for complex diseases are often involved more than one locus and/or associated with quantitative biomarkers or environmental factors. Incorporating these quantitative variables into linkage analysis as well as localizing two linked disease loci simultaneously could therefore improve the efficiency in mapping genes. We extended the robust multipoint IdentitybyDescent (IBD) approach with incorporation of covariates developed previously to simultaneously estimate two linked loci using different types of affected relative pairs (ARPs).
Results
We showed that the efficiency was enhanced by incorporating a quantitative covariate parametrically or nonparametrically while localizing two disease loci using ARPs. In addition to its help in identifying factors associated with the disease and in improving the efficiency in estimating disease loci, this extension also allows investigators to account for heterogeneity in riskratios for different ARPs. Data released from the collaborative study on the genetics of alcoholism (COGA) for Genetic Analysis Workshop 14 (GAW 14) were used to illustrate the application of this extended method.
Conclusions
The simulation studies and example illustrated that the efficiency in estimating disease loci was demonstratively enhanced by incorporating a quantitative covariate and by using all relative pairs while mapping two linked loci simultaneously.
Background
With the advance of genotyping techniques, genomewide association analysis has become the mainstream technique in genetic mapping. However, studies have shown that using information from linkage scans can improve the power of association mapping in genome scans [1]. In addition, linkage analysis could be more powerful than association analysis for some genetic mechanisms; family data can also help to estimate familial risks [2]. Hence, linkage analysis remains a useful and supplemental tool to map genes for complex diseases. As complex diseases often involve quantitative biomarkers or environmental factors, incorporating these quantitative factors into linkage mapping can improve the power to detect disease loci [3] or the efficiency of estimating disease loci. Efficiency is defined as the inverse of the variance estimate for the disease locus estimate. Thus, smaller variance estimates have higher efficiencies. Moreover, the incorporation of covariates provides information that can be used to characterize disease loci, which is important for understanding disease etiologies and mechanisms and for identifying population subgroups that may have particularly high disease risks [4]. Methodologic work has demonstrated that failure to adequately account for genecovariate interaction in a genetic analysis can mask the effects of both genes and covariates [57]. Hence, it is important to develop linkage approaches that allow the inclusion of covariates.
Thus far, several linkage analyses including covariates have been proposed to account for linkage heterogeneity or to examine biological, environmental, genegene or geneenvironment interaction effects. Devlin (2002) [5] accounts for linkage heterogeneity by incorporating a familylevel covariate into likelihoodbased mixture models; however, this approach accounts for linkage heterogeneity only. Greenwood and Greenwood (1997, 1999) [6,8] incorporated covariates into genome scanning approaches using sibpair or relativepair through modelbased logarithms of odds (LOD) score approaches, where the generalized expected identitybydescent (IBD) sharing was modeled as a function of some covariates through multinomial logistic regression. Rice (1999) [7] applied a novel technique to detect significant covariates in linkage analyses with a logistic regression approach using all sib pairs (concordant affected, concordant unaffected, and discordant), and Saccone et al. (2001) [9] further extended this analysis to cousin pairs. Olson (1999) [10] proposed a unified framework for modelfree linkage analysis that can handle the separate inclusion of other ARPs, discordant relative pairs, covariates, or additional disease loci through a conditionallogistic parameterization. These regressionbased approaches can easily be generalized to include all covariates; however, they assume either one disease locus or multiple unlinked loci and thus are not applicable to analyses of multiple linked loci. For nonregressionbased approaches, Hauser et al. (2004) [11] proposed a modelfree LOD scores approach that includes familylevel covariate information. This approach also assumes only one disease locus and can only incorporate one covariate at a time. In addition, the problem of multiple testing may arise when researchers perform multiple tests or analyses using various combinations of multiple loci or covariates using these approaches.
On the other hand, most twolocus linkage approaches aim to detect the presence of a second susceptibility gene by accounting for the effects of a known susceptibility gene [1214]. However, when two susceptibility loci are linked, the location of the first gene may be inaccurate because it was mapped without accounting for the effects of the linked gene. Thus, conditional analyses that rely on an inaccurate position for the first locus may result in an inaccurate second disease loci estimate as well. Biswas et al. (2003) [15] applied a Bayesian approach to simultaneously detect two linked disease genes; however, their approach was designed to detect genes under locus heterogeneity only, and this modelbased approach requires the specification of unknown genetic parameters. Hence, linkage approaches that can simultaneously localize two linked disease genes are in great demand.
Rather than testing the presence of linkage, Liang et al. (2001) [16] developed a novel, robust, modelfree multipoint linkage method that simultaneously estimates both the position of a disease locus as well as its effect on the disease, along with its sampling uncertainty. The advantages of this method include: (i) It does not require specification of an underlying genetic model; hence, estimation of the parameters is robust to a wide variety of genetic mechanisms. (ii) The multiple testing issue is eliminated as a single test statistic is provided for linkage in the entire studied region; rather than testing the hypothesis for one marker at a time. (iii) While multiple markers are incorporated simultaneously in the gene mapping, there is no need to specify the phase of genotypic data with multiple markers. Many complex diseases, such as hypertension, schizophrenia, diabetes, and asthma are usually defined as dichotomous phenotypic traits; however, they are also associated with quantitative biological markers or quantitative risk factors. As a result, Glidden et al. (2003) [17] further incorporated quantitative covariates into Liang's approach [16] and estimated the genetic effect of a disease locus through a logistictype parametric model using affected sib pairs (ASPs). Based on the same study design, Chiou et al. (2005) [18] incorporated quantitative covariates into their linkage mapping and estimated the genetic effect of a disease locus nonparametrically. This quantitative covariate could be either an environmental risk factor or itself a quantitative trait. For the quantitative trait incorporated as a covariate, its QTL (quantitative trait locus) may directly underlie a pathway of the disease or be linked to the disease locus, or the trait may be indirectly associated with the disease.
Meanwhile, Schaid et al. (2005) [19] extended the withoutacovariate approach by Liang et al. [16] to different types of ARPs. The authors' extension relaxed the limitation to ASPs only and allowed an investigator to study the riskratios of a disease gene estimated from multiple relative pairs; this work helped to uncover the underlying genetic mechanism of disease. To jointly localize two linked disease loci using ASP data, Biernacka et al. (2005) [20] extended this approach [16] to the localization of two linked diseasesusceptibility genes. They also provided tests for the presence of two linked diseasesusceptibility genes by a quasilikelihood ratio test and a modified score test in another article [21]. Lin and Schaid (2007) [22] generalized the twolocus localization method to a variety of ARPs. Both of the unconstrained and constrained models, along with a score test and the examination of the goodness of fit of a used constrained model, were described in their generalized method. As the etiology of complex diseases often involves quantitative variables (either genetic biomarkers or environmental factors) in addition to multiple disease loci, it is helpful to incorporate a quantitative variable while localizing two linked disease loci simultaneously using ARPs. We extended Lin and Schaid's (2007) [22] approach to incorporate quantitative covariates in twolocus linkage mapping using ARPs. Generally, a statistical parametric model is simpler and easier to interpret than a nonparametric model, while a nonparametric model has the flexibility to fit the data perfectly. To take advantages of parametric and nonparametric statistical models, we applied both models to incorporate covariates. These methods can also be applied to account for heterogeneity from quantitative covariates as well as from multiple subgroups that are stratified by categorical covariates. Systematic simulation studies under a variety of quantitative covariates were conducted to evaluate the gain in efficiency of estimating the disease loci from the proposed methods. The estimates from the proposed approaches with incorporation of covariates were compared with those from the approach without incorporating covariates. The collaborative study on the genetics of alcoholism (COGA) data released for GAW14 was used to illustrate the proposed approaches.
Methods
To incorporate relevant covariate information while simultaneously estimate the locations of two genes using all types of relative pairs in linkage analysis, we proposed the following linkage approaches.
Simultaneous Localization of Two Linked Disease Susceptibility Genes with Incorporation of Covariates
Consider a chromosomal region harboring two linked disease loci, τ_{1 }and τ_{2}, with M markers genotyped at the locations 0 = t_{1 }<t_{2 }< ⋯ <t_{M}. Let S_{ki}(t_{j}) be the identitybydescent (IBD) sharing for the j^{th }marker of the i^{th }pair of the ARP type k, j = 1,...,M, i = 1,...,n_{k}, k = 1,...,5. The five types of relative pairs considered include full siblings (SP, k = 1), half siblings (HS, k = 2), first cousins (FC, k = 3), grandparentgrandchild pairs (GP, k = 4) and avuncular pairs (AP, k = 5) [19]. The five affected relative pairs are abbreviated as ASP, AHS, AFC, AGP and AAP. Let x_{ki1}, x_{ki2 }be the covariates associated with relatives 1 and 2 in the i^{th }relative pair of type k, respectively. Given the covariates and assuming that the recombination fraction does not depend on the covariates, the expectation of IBD sharing at t_{j }for a relative pair ki [22] is
where C_{lk}(x_{ki1}, x_{ki2}) = E(S_{ki }(τ_{1})x_{ki1}, x_{ki2}, Φ) a_{k }is the genetic effect at locus l for a relative pair ki ;l = 1, 2; Φ is the event of an ARP; d_{1 }= τ_{1 } t_{j}, d_{2 }= t_{j } τ_{2}; d_{3 }= τ_{2 } τ_{1}; a_{k }is the expected count for random sharing; b_{k}(d_{v}) controls the rate of decrease of expected sharing as the distance d_{v }from the trait locus increases; and v = 1,2,3. Haldane's mapping function was used to translate recombination fraction to map distance. The values of b_{k }(d_{v}) and d_{v }for each relative type k and functions relating the risk ratio λ to C are listed in supplemental Additional file 1 Table S1 (adopted from Table 1 in Lin and Schaid (2007) [22]).
Additional file 1. Table S1. Expected alleles shared IBD at location t for five types of ARPs and functions relating λ to C
Format: DOC Size: 47KB Download file
This file can be viewed with: Microsoft Word Viewer
Table 1. Simultaneous twolocus search incorporating quantitative traits with QTLs at τ_{1}(X_{QTL1}) or τ_{2}(X_{QTL2})
C_{1 }and C_{2 }represent the amount of excess IBD sharing at each of the two disease gene loci, which is increased by effects due to both disease genes. The simple "effect size" interpretation does not apply to C_{1 }and C_{2 }in the twolocus model because the magnitude of C_{1 }depends not only on the effect of gene 1 but also on the distance between gene 1 and gene 2. C_{1 }and C_{2 }can each be reparameterized to represent excess sharing at a location due to the gene at that location and thus can be considered the "effect size" of that particular gene (see Appendix of [20], page 47). They can then be used to test for the presence of linkage. We applied parametric and nonparametric methods to model the association between the excess IBD sharing (C_{l}) at τ_{l}, l = 1, 2 and the covariates.
Parametric Modeling on C
In the parametric model, C_{1k }and C_{2k }can be modeled as a function of covariates [17]; an example is the postulation of a logistic regression for IBD sharing at τ_{1 }and τ_{2}. For a relativepair type k, assuming G_{lk }= (g_{lk1},⋯,g_{lkp})^{T }is the covariate vector, C_{1k }and C_{2k }were modeled separately, where g_{lkr }= g_{lkr}(x_{kr1}, x_{kr2}), r = 1,...,p, indicate covariates.
where β_{lk}^{T }= (β_{lk1},⋯,β_{lkp}), l = 1, 2, k = 1,...,5; f_{k }= 1 for ASP, f_{k }= 4 for AFC, and, f_{k }= 2 for other ARPs. The geneenvironment interaction for environmental variable, x_{r}, could be assessed by examining whether the corresponding βcoefficient, β_{r}, is statistically significantly different from zero. In addition, the interactions between two covariates on the genetic effects of the disease loci could also be assessed by adding an interaction term between the two covariates.
Nonparametric Modeling on C
For the nonparametric model, given the data
where K is a pvariate Epanechikov kernel function,
H is a nonsingular square bandwidth matrix [18], and a_{k }is the expected count for random sharing [19].
Estimating τ_{1 }and τ_{2}
Given the function C_{lk}(x_{ki1}, x_{ki2}), the trait locus τ_{l }can be estimated by solving the estimating equation [16,18] (4) below. Once the estimate of C_{lk }is obtained, it can be plugged into the equation (4) and the estimate of τ_{l }can be updated. That is, we replace C_{lk}(x_{ki1}, x_{ki2}) with the estimate
where S_{ki }= (S_{ki}(t_{1}),⋯,S_{ki}(t_{M}))', and
with
The estimates of C_{lk }and δ were iteratively updated until the convergent criteria for δ were met. Assuming all relative pairs share a common δ, the estimates of δ follows asymptotic normality (see Additional file 2, Appendix for details) with a mean vector δ and a covariance matrix ∑^{1}, where.
Additional file 2. Appendix. Theoretical derivations.
Format: DOC Size: 192KB Download file
This file can be viewed with: Microsoft Word Viewer
Simulation Studies
Families with three generations including eight members were simulated: The first generation (4 grandparents) included one or zero affected subjects, the second generation had no affected members, and the third generation included two affected individuals. In total, 200 independent families were simulated, each including one affected sibpair. Of the 200 families, 100 included two affected grandparentgrandchild pairs, with the others not having any affected grandparentgrandchild pairs. Hence, there were 200 ASPs and 200 AGPs per replicate. In total, 1,000 replicates were simulated for each configuration.
One disease locus model
First, we extended the onelocus model proposed by Schaid et al. (2005) [19] with ARP to incorporate covariates using both parametric modeling [17] and nonparametric modeling [18]. We studied the enhancement of efficiency incurred by the incorporation of a quantitative covariate and by the usage of relative pairs in place of using sib pairs alone within a onelocus model. Three sets of penetrance rates (f_{2}, f_{1}, f_{0}) for the genotypes of two highrisk alleles (f_{2}), one high and one lowrisk alleles (f_{1}), and two lowrisk alleles (f_{0}) at the disease locus used in the simulation study were (i) (0.67,0.05,0.007) (recessive model), (ii) (0.67,0.55,0.007) (dominant model) and (iii) (0.8,0.4,0.0) (additive model), respectively.
A covariate might be directly or indirectly associated with the disease loci, and the information from covariates under different genetic mechanisms may differentially enhance the search for the disease loci. We studied a variety of covariates correlated with the disease trait under different scenarios: (1) a quantitative trait with a pleiotropic effect (that is to say a quantitative trait that is controlled by the disease locus, τ_{1}, namely, its QTL is τ_{1}, yet is not directly associated with liability of the disease); (2) a quantitative trait with a coincidence effect in which the QTL is linked to a disease locus by incidence, yet does not share common genetic components from the disease locus; (3) a quantitative trait unlinked to the disease loci; (4) a covariate of age at onset with the distribution logT = log λ βZ + ε/γ, where Z is the number of copies of the disease allele [17] at one disease locus. The variable ε is distributed as a standard extremevalue random variable with λ = 0.03, γ = 5.0, and β = 0.57; this distribution was built while assuming that the disease allele frequency is 0.05. The distribution of age at onset (T) followed a Weibull distribution, and the disease allele accelerated the onset of disease by a factor of 1.78. The threshold of age at onset was 70.
The quantitative trait y for scenarios (1)  (3) follows a multivariate normal distribution y_{i }= μ_{i }+ g_{i }+ e_{i}, e_{i }~ N(0, Σ_{i}), i = 1,...,n, where
Two disease locus model
Furthermore, we simulated a twolocus disease model and compared the estimates of τ_{1 }and τ_{2 }from approaches with and without incorporating a covariate. We generated the twolocus models of model B in Biernacka et al. [20] as described in Additional file 3, Table S2 to study the impact of covariates on the estimates from the withoutacovariate approach and parametric and nonparametric withacovariate approaches.
Additional file 3. Table S2. The twolocus genetic model used in simulation studies.
Format: DOC Size: 31KB Download file
This file can be viewed with: Microsoft Word Viewer
For genotype data, we generated ten markers that were equally spaced at 10 cM between adjacent markers, with each marker having eight equalfrequency alleles, and the two diallelic disease loci were located at 35 and 75 cM. For scenarios (1), (2) and (3), an additive genetic model for the quantitative trait covariate was assumed. The covariate used in modeling C_{l }was denoted by y_{l }, with l = 1,2. Assuming the quantitative traits X_{QTL1 }and X_{QTL2 }were controlled by τ_{1}, τ_{2 }respectively, we examined the impact of different combinations of traits incorporated in functions of g_{lk }on estimating the two trait loci. As in the simulation for the onelocus model, four scenarios were considered for the QTL of each covariate: (1) The QTL is at 35 cM (τ_{1}) (pleiotropic effect); (2) the QTL for "age at onset" (covariate) is at 35 cM (τ_{1}); (3) the quantitative trait's QTL is at 45 cM (coincident effect); (4) the covariate's QTL is not linked to either disease locus. All covariates were determined by averaging the two individuals' covariate values in one pair, that is, g_{ki}= (x_{ki1}+ x_{ki2})/2.
Results
For the comparison under onelocus models (Figure 1, Additional file 4, Tables S3  S5), the efficiency in estimating the disease locus was enhanced substantially
when incorporating a quantitative covariate, regardless of its underlying genetic
mechanisms. In the additive model using affected sibpairs, the relative efficiency
(RE) ranged from 1.24 to 1.69 for the parametric approach and from 2.37 to 2.40 for
the nonparametric approach. After adding affected grandparentgrandchild pairs, the
RE ranged increase to 3.93.95 for the parametric approach and 1.672.13 for the nonparametric
approach. The parametric approach generally had higher RE than the nonparametric
approach in the simulated scenarios (Additional file 4, Tables S3  S5). Given the same heritability of a quantitative trait, incorporating
a quantitative trait with a pleiotropic effect was generally more efficient than when
incorporating a linked or an unlinked trait. The variance estimate for
Additional file 4. Tables S3S5. Table S3. Comparisons of estimate for τ with and without incorporation of a quantitative covariate under onelocus recessive model (a). Table S4. Comparisons of estimate for τ with and without incorporation of a quantitative covariate under onelocus dominant model (b). Table S5. Comparisons of estimate for τ with and without incorporation of a quantitative covariate under onelocus additive model (c).
Format: DOC Size: 77KB Download file
This file can be viewed with: Microsoft Word Viewer
Figure 1. Relative efficiency (RE) between two approaches in estimating the disease locus under three genetic models (a), (b) and (c). The dotted lines are the RE for comparisons between two types of affected relative pairs in the nonparametric approach. The solid lines are the RE for comparisons between two types of affected relative pairs in the parametric approach. ASP, AGG and COM stand for affected sib pairs, affected grandparentgrandchild, and combined affected sib pairs and grandparentgrandchild pairs, respectively. ASP_wo stands for using ASP without incorporating a covariate. The circle, pund, v and x signs refer to the relationship between the covariate's QTL and the disease locus, including (i) pleiotropy, (ii) coincident, (iii) unlinked, and (iv) a covariate of age at onset.
The smoothing parameter in (3) was set to one half of the range of the covariates, which roughly minimizes the variance estimate of the estimated loci in the analysis. The choice of bandwidth in the nonparametric approach did not have much impact on the estimation though [18]. The selection of function g(·) might slightly influence bias and variance of the estimates for disease loci (these results not shown here). Results from both parametric and nonparametric approaches suggested that the efficiency in estimating disease locus was improved when combining affected sib pairs and grandparentgrandchild pairs.
Since there were two linked loci controlling the disease, we generated covariates X_{QTL1 }and X_{QTL2}, controlled by τ_{1 }and τ_{2}, respectively, and studied the impact of four different ways to incorporate X_{QTL1 }or X_{QTL2 }into the linkage mapping: (i) incorporating X_{QTL1 }only (y_{1 }= X_{QTL1}, y_{2 }= X_{QTL1}); (ii) incorporating X_{QTL2 }only (y_{1 }= X_{QTL2}, y_{2 }= X_{QTL2}); (iii) incorporating y_{1 }= X_{QTL1}, y_{2 }= X_{QTL2 }to estimate C_{1}, C_{2}, respectively; (iv) incorporating y_{1 }= X_{QTL2}, y_{2 }= X_{QTL1}, to estimate C_{1}, C_{2}, respectively. Tables 1 illustrates the impact of choosing different covariates on estimates by parametric and nonparametric approaches, respectively. In reality, we do not have information about the underlying genetic mechanism of the quantitative traits (covariates); luckily, the efficiency in estimating the disease loci was improved under any one of the above scenarios when compared to the estimates made without covariates. Since the quantitative traits were controlled by the two disease loci, incorporating both quantitative traits was helpful in estimating both loci and their 95% coverage probabilities. When incorporating only one quantitative trait, the bias and variance estimate for its corresponding disease locus (QTL) were smaller; this finding was particularly true within the parametric approach. Additionally, both of the covariates were significantly associated with the genetic effects from the two disease loci in the parametric approach (pvalues = 0.029 ~ 0.050).
We also evaluated the performance of the parametric and nonparametric approaches with varying locations for covariates' QTLs (Table 2). In the parametric approach, the efficiency in estimating a disease locus was improved when the set location of the covariate's QTL was linked to the disease locus, particularly when the disease locus was also the QTL of the covariate. For example, when no covariate was incorporated, the variance estimates were 7.5 and 6.9 for the two disease loci, respectively (Additional file 5, Table S4); when a quantitative trait with a pleiotropic effect was incorporated, the variance estimates were 4.0 and 4.0 respectively (Table 2). Compared with the estimate without incorporating a covariate, the bias was slightly higher than when the covariate's locus was not the disease locus but was instead linked or unlinked to the disease locus. The biases for estimating the two loci were 0.02 and 0.2 with the pleiotropic covariate and 0.3 and 0.4 with the unlinked covariate (Table 2). In the parametric approach, the magnitude of the regression coefficient reflects the association between the disease locus and the covariate. The regression coefficient was significant only when the covariate's QTL was one of the disease loci (pleiotropy effect) (Table 2). After incorporating a covariate, the 95% coverage probabilities for τ_{1 }and τ_{2 }were both more precise than those obtained without incorporating a covariate (Tables 1 and 2; Additional file 5, Table S6). In the nonparametric approach, the efficiency in estimating both disease loci was improved when the covariate's QTL was at position τ_{1 }(Table 2; pleiotropic covariate or age at onset). The efficiency was lower when the covariate's QTL was linked or unlinked to position τ_{1 }(Tables 2). The bias was generally higher for τ_{2 }in the scenario where the covariate provides information for τ_{1 }only (Tables 2).
Additional file 5. Table S6. Simultaneous twolocus search without incorporating a covariate.
Format: DOC Size: 30KB Download file
This file can be viewed with: Microsoft Word Viewer
Table 2. The impact of the location of the QTL for the covariate  parametric and nonparametric approaches
A Data Example
We conducted an autosomewide scan for affected relative pairs from the COGA data [23]. Note that the disease was defined as "having psychological problems from drinking." There are 149 affected sib pairs, 8 half sib pairs, 16 firstcousins pairs, 7 grandparentgrandchild pairs, and 71 avuncular pairs in this data set. Due to the limited sample sizes for some relative pairs, we examined the linkage peak on chromosome 1 using 149 affected sib pairs and 71 avuncular pairs, with and without incorporating the quantitative covariate "Maximum number of drinks in a 24 hour period." Using both ASPs and AGPs, the disease locus was estimated to be at 113.7 cM on chromosome 1 with a 95% CI: 109.5118.0 cM. The estimate for C_{ASP }was 0.18 with a 95% CI from 0.100.26 (pvalue = 7.6e6), whereas the estimate for C_{AAP }was 0.064 with a 95% CI from 0.00010.13 (p = 0.051) (Table 3 and Additional file 6, Figure S1). We also applied single locus with covariate linkage mapping using ARP to locate the disease locus and assess the significance of its covariates. The disease locus estimate was 110.8 (standard error (SE) = 1.5) and 109.2 (SE = 2.3) cM in the parametric and nonparametric approaches, respectively, using all ARPs. The pvalues of the covariate in the parametric approach are 0.52 and 0.20 for ASP and AAP, respectively (Table 3). To identify a region harboring two disease loci, we plotted the empirical IBD sharing of all autosomes for ASP (because the data set included mostly sib pairs). After visually reviewing all the empirical IBD sharing on autosomes, we selected chromosome 3 as a region to illustrate our approach, as there appeared to be two diseasesusceptibility loci harbored within this region (Figure 2). First, we conducted the twolocus search without incorporating the covariate (Table 4) and compared the estimates to those that did incorporate covariates. The quantitative measure "maximum number of drinks in a 24hour period" [24] was incorporated into the linkage mapping, both parametrically (Table 5) and nonparametrically (Table 6). The 95% confidence intervals (CIs) for C or λ were constructed with the bootstrap resampling approach. A total of 1,000 replicates were obtained by resampling. The disease loci estimates were computed for each sample and ranked. The lower and upper limits of the 95% confidence interval were the 2.5% and 97.5% percentiles of the 1,000 replicates, respectively.
Additional file 6. Figure S1. Comparisons of estimates (denoted by "x") and their 95% CIs (denoted by brackets) for the disease locus on chromosome one from nonparametric, parametric and withoutacovariate approaches using affected sib pairs
Format: JPEG Size: 2.2MB Download file
Table 3. Onelocus search on chromosome 1 with or without incorporation of "Maximum number of drinks in a 24 hour period"
Figure 2. Comparisons of estimates (denoted by "x") and their 95% CIs (denoted by brackets) for disease loci from nonparametric, parametric and withoutacovariate approaches using affected sib pairs.
Table 4. Simultaneous twolocus search without incorporating a covariate
Table 5. Simultaneous twolocus search with incorporation of "Maximum number of drinks in a 24 hour period"  parametric approach
Table 6. Simultaneous twolocus search with incorporation of "Maximum number of drinks in a 24 hour period"  nonparametric approach
The standard errors for the estimates of the disease loci were always smaller when using the entire data set with both sibpairs and avuncular pairs, compared to the estimates using sib pairs or avuncular pairs alone. Compared to the approach without the covariate, the relative efficiencies (each defined as the ratio of reversed variance estimates for the disease locus estimates) in estimating τ_{1 }and τ_{2 }are 20.25 ((0.7/0.2)^{2}) and 8.92 ((6.84/2.29)^{2}) for the nonparametric approach (Table 6) and 0.24 ((0.72/1.47)^{2}) and 11.8 ((6.84/1.99)^{2}) for the parametric approach (Table 5). The average estimated C_{1 }and C_{2 }were 0.084 and 0.16 for affected sibpairs in the nonparametric approach (Table 6), and were 0.16 and 0.24 in the parametric approach (Table 5). The corresponding risk ratios λ_{l }for these two loci in sib pairs within the nonparametric approach were 1.20 (95% CI: 0.99 to 1.79) and 1.45 (95% CI: 1.02 to 2.09), respectively (Table 6). The C value (or risk ratio) at τ_{2 }(0.237, 95% CI: 0.066 to 0.430) was higher than that at τ_{1 }(0.156, 95% CI: 0.014 to 0.319), and it was marginally significant after incorporation of the covariate (Table 5). The C_{l }and λ_{l }values estimated from avuncular pairs were smaller than those estimated from sib pairs (Tables 4, 5, 6) with incorporation of the covariate; however, this difference was not statistically significant. Since there was no evidence of linkage at τ_{1}, the estimate for τ_{1 }varied in the three approaches.
Discussion and Conclusions
Many complex diseases involve multiple loci as well as multiple quantitative biological markers or quantitative risk factors. Incorporating covariates into linkage analysis is not only helpful for the identification of disease loci but is also informative with respect to disease etiology. In familybased studies, data are often available for larger pedigrees with multiple relative pairs, and therefore it is desirable to have linkage mapping approaches that can use these potentially informative data. In addition, different types of ARPs may have the potential of providing some insight into the underlying genetic mechanism [19]. Applying a onelocus model to localize a disease gene when there are actually two linked disease genes in the region is likely to estimate the two true disease gene locations inaccurately, while the corresponding effect size tends to be overestimated [20]. Therefore, we extended a robust multipoint linkage approach in simultaneously mapping two linked disease loci while using affected relative pairs with an incorporation of quantitative covariates. A series of intensive simulation studies were conducted to examine the performance of the approach when the incorporated covariate was a quantitative trait under a variety of genetic models or when the trait was a risk factor associated with a disease locus. The simulation study suggested that incorporating a quantitative covariate, which also happened to be a quantitative trait, helped improve the efficiency of the diseaselocus estimate, regardless of the genetic models that actually underlie the incorporated covariate. It seems that the underlying genetic models of the quantitative covariate (trait) did not have much impact on the efficiency in estimating τ_{l}, l = 1,2. In addition, the inclusion of different relative pairs would make the sample size larger and improve the efficiency of the diseaselocus localization when the different relative pairs share common disease loci; this would be particularly true when the genetic effect of the disease loci is small or modest. When the covariate was directly related to the liability of the disease, the efficiency improvement was greater than when it was not directly related to the disease liability; when the covariate was associated with only one disease locus, incorporating the covariate helped improve the efficiency of that locus' estimate more than that of the other locus. The position of the QTL for a quantitative trait (as a covariate) might slightly affect the accuracy of the diseaseloci localization; the accuracy was similar to the situation in which no covariates were incorporated given an unlinked relationship between the QTL and disease locus. Investigators can choose to incorporate covariates that improve efficiency in diseaseloci estimation. Our example of an alcoholism study illustrates that incorporating a quantitative covariate into the linkage mapping helps improve the efficiency of diseaseloci estimates in the twolocus models by either the parametric approach or the nonparametric approach. The assessment of associations between the disease loci and covariates helps resolve the underlying genetic mechanism of the disease. Using all affected relative pairs to estimate the common disease loci could also enhance the efficiency in estimating disease loci, and, furthermore, it could help dissect disease etiology by assessing risk ratios among different types of relative pairs.
Although the proposed approaches can be quite helpful and can also be widely applied to localize disease loci for complex diseases, they are built upon the assumption of a twolocus disease mechanism. Bias may arise when a region harboring one locus only or more than two linked loci is examined. In addition, since the relationships between the genetic effects on the two disease loci and covariates are modeled separately, the number of parameters may easily be increased when (1) several covariates are incorporated simultaneously; or (2) regression relationships between the genetic effects on the two disease loci and covariates are not assumed to be identical; or (3) several relative types are analyzed. Additionally, since fitting an incorrect model can lead to biased estimates with anticonservative confidence intervals, it is important to decide whether a onelocus or twolocus model is more appropriate. In practice, it is always helpful to check the empirical plot (as shown in Figure 2) to determine how many "peaks" are present in the region of interest. If there is only one "peak," a onelocus model might be more appropriate than a twolocus model. If more than two peaks are present, it might be helpful to split the region into multiple smaller regions containing only two peaks each. Indeed, it is always helpful to apply both onelocus and twolocus models and evaluate which model fits the data better. In addition, the test developed by Biernacka et al. [21] can be used to help choose an appropriate model.
The proposed approaches allow genegene and geneenvironment interactions to be assessed. As complex diseases often involve more than two disease genes, further efforts to extend this method to situations involving more than two genes are warranted. In addition, as the regions identified through linkage mapping are quite wide and may harbor numerous genes, future approaches should be developed to identify potential causal polymorphisms by the joint modeling of linkage and association.
Authors' contributions
YFC, JMC and KYL have made contributions to the theory derivation, simulation study, statistical modeling and draft of the manuscript. CYL participated in the design of the study and performed the simulation studies and data analysis. All authors read and approved the final manuscript.
Acknowledgements
We thank the data provided by the Collaborative Study on the Genetics of Alcoholism (U10AA008401). We thank the reviewers for their constructive comments, which greatly improved the quality of this manuscript. This work was supported by grant GRC 94B0011 to J.M.C. from Academia Sinica; and, in part, by grants PH098pp04 and NSC982118M400002 to Y.F.C. from National Health Research Institutes and National Science Council respectively; and a grant to K.Y.L. from National Institutes of Health, U.S.A. (HL090577).
References

Roeder K, Bacanu SA, Wasserman L, Devlin B: Using linkage genome scans to improve power of association in genome scans.
American Journal of Human Genetics 2006, 78:243252. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

ClergetDarpoux F, Elston RC: Are linkage analysis and the collection of family data dead? Prospects for family studies in the age of genomewide association.
Hum Hered 2007, 64(2):9196. PubMed Abstract  Publisher Full Text

Goddard KA, Witte JS, Suarez BK, Catalona WJ, Olson JM: Modelfree linkage analysis with covariates confirms linkage of prostate cancer to chromosomes 1 and 4.
Am J Hum Genet 2001, 68(5):11971206. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Gauderman WJ, Siegmund KD: Geneenvironment interaction and affected sib pair linkage analysis.
Hum Hered 2001, 52(1):3446. PubMed Abstract  Publisher Full Text

Devlin B, Jones BL, Bacanu SA, Roeder K: Mixture models for linkage analysis of affected sibling pairs and covariates.
Genet Epidemiol 2002, 22(1):5265. PubMed Abstract  Publisher Full Text

Greenwood CM, Bull SB: Incorporation of covariates into genome scanning using sibpair analysis in bipolar affective disorder.
Genet Epidemiol 1997, 14(6):635640. PubMed Abstract  Publisher Full Text

Rice JP, Rochberg N, Neuman RJ, Saccone NL, Liu KY, Zhang X, Culverhouse R: Covariates in linkage analysis.
Genet Epidemiol 1999, 17(Suppl 1):S691695. PubMed Abstract

Greenwood CM, Bull SB: Analysis of affected sib pairs, with covariateswith and without constraints.
Am J Hum Genet 1999, 64(3):871885. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Saccone NL, Rochberg N, Neuman RJ, Rice JP: Covariates in linkage analysis using sibling and cousin pairs.
Genet Epidemiol 2001, 21(Suppl 1):S540545. PubMed Abstract

Olson JM: A general conditionallogistic model for affectedrelativepair linkage studies.
Am J Hum Genet 1999, 65(6):17601769. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Hauser ER, Watanabe RM, Duren WL, Bass MP, Langefeld CD, Boehnke M: Ordered subset analysis in genetic linkage mapping of complex traits.
Genet Epidemiol 2004, 27(1):5363. PubMed Abstract  Publisher Full Text

Farrall M: Affected sibpair linkage tests for multiple linked susceptibility genes.
Genet Epidemiol 1997, 14(2):103115. PubMed Abstract  Publisher Full Text

Delepine M, Pociot F, Habita C, Hashimoto L, Froguel P, Rotter J, CambonThomsen A, Deschamps I, Djoulah S, Weissenbach J, et al.: Evidence of a nonMHC susceptibility locus in type I diabetes linked to HLA on chromosome 6.
Am J Hum Genet 1997, 60(1):174187. PubMed Abstract  PubMed Central Full Text

Cordell HJ, Wedig GC, Jacobs KB, Elston RC: Multilocus linkage tests based on affected relative pairs.
Am J Hum Genet 2000, 66(4):12731286. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Biswas S, Papachristou C, Irwin ME, Lin S: Linkage analysis of the simulated data  evaluations and comparisons of methods.
BMC Genet 2003, 4(Suppl 1):S70. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Liang KY, Chiu YF, Beaty TH: A robust identitybydescent procedure using affected sib pairs: multipoint mapping for complex diseases.
Hum Hered 2001, 51(12):6478. PubMed Abstract  Publisher Full Text

Glidden DV, Liang KY, Chiu YF, Pulver AE: Multipoint affected sibpair linkage methods for localizing susceptibility genes of complex diseases.
Genet Epidemiol 2003, 24(2):107117. PubMed Abstract  Publisher Full Text

Chiou JM, Liang KY, Chiu YF: Multipoint linkage mapping using sibpairs: nonparametric estimation of trait effects with quantitative covariates.
Genet Epidemiol 2005, 28(1):5869. PubMed Abstract  Publisher Full Text

Schaid DJ, Sinnwell JP, Thibodeau SN: Robust multipoint identicalbydescent mapping for affected relative pairs.
Am J Hum Genet 2005, 76(1):128138. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Biernacka JM, Sun L, Bull SB: Simultaneous localization of two linked disease susceptibility genes.
Genet Epidemiol 2005, 28(1):3347. PubMed Abstract  Publisher Full Text

Biernacka JM, Cordell HJ: Exploring causality via identification of SNPs or haplotypes responsible for a linkage signal.
Genet Epidemiol 2007, 31(7):727740. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Lin WY, Schaid DJ: Robust multipoint simultaneous identicalbydescent mapping for two linked loci.
Hum Hered 2007, 63(1):3546. PubMed Abstract  Publisher Full Text

Edenberg HJ, Bierut LJ, Boyce P, Cao M, Cawley S, Chiles R, Doheny KF, Hansen M, Hinrichs T, Jones K, et al.: Description of the data from the Collaborative Study on the Genetics of Alcoholism (COGA) and singlenucleotide polymorphism genotyping for Genetic Analysis Workshop 14.
BMC Genet 2005, 6(Suppl 1):S2. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Bagnardi V, Zatonski W, Scotti L, La Vecchia C, Corrao G: Does drinking pattern modify the effect of alcohol on the risk of coronary heart disease? Evidence from a metaanalysis.
Journal of Epidemiology and Community Health 2008, 62(7):615619. PubMed Abstract  Publisher Full Text