Email updates

Keep up to date with the latest news and content from BMC Genetics and BioMed Central.

Open Access Highly Accessed Methodology article

Multitrait analysis of quantitative trait loci using Bayesian composite space approach

Ming Fang1*, Dan Jiang2, Li Jun Pu1, Hui Jiang Gao34, Peng Ji5, Hong Yi Wang5 and Run Qing Yang6

Author Affiliations

1 Life Science College, Heilongjiang August First Land Reclamation University, Daqing, 163319, PR China

2 College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100094, PR China

3 College of Animal Science and Technology, Northeast Agricultural University, Harbin, 150030, PR China

4 College Animal Science and Technology, China Agricultural University, Beijing, 100094, PR China

5 College of Plant Science and Technology, Heilongjiang August First Land Reclamation University, Daqing, 163319, PR China

6 School of Agriculture and Biology, Shanghai Jiaotong University, Shanghai, 201101, PR China

For all author emails, please log on.

BMC Genetics 2008, 9:48  doi:10.1186/1471-2156-9-48

The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2156/9/48


Received:19 September 2007
Accepted:18 July 2008
Published:18 July 2008

© 2008 Fang et al; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background

Multitrait analysis of quantitative trait loci can capture the maximum information of experiment. The maximum-likelihood approach and the least-square approach have been developed to jointly analyze multiple traits, but it is difficult for them to include multiple QTL simultaneously into one model.

Results

In this article, we have successfully extended Bayesian composite space approach, which is an efficient model selection method that can easily handle multiple QTL, to multitrait mapping of QTL. There are many statistical innovations of the proposed method compared with Bayesian single trait analysis. The first is that the parameters for all traits are updated jointly by vector or matrix; secondly, for QTL in the same interval that control different traits, the correlation between QTL genotypes is taken into account; thirdly, the information about the relationship of residual error between the traits is also made good use of. The superiority of the new method over separate analysis was demonstrated by both simulated and real data. The computing program was written in FORTRAN and it can be available for request.

Conclusion

The results suggest that the developed new method is more powerful than separate analysis.

Background

Multitrait analysis is defined as a method that includes all traits simultaneously in a single model [1], and can take into account the correlation among all traits. Many methods have been developed for mapping QTL by combining information of multiple traits. Jiang and Zeng [2] proposed a maximum likelihood approach, and concluded that joint analysis could improve the precision of parameter estimates and had higher QTL detecting power than separate analysis. A multitrait least-square approach was proposed by Knott and Haley [3] to detect QTL. It is a method that programs easily and computes fast, and compared with separate analysis of each trait, can increase the power to detect a pleiotropic QTL and improve the precision of the location estimate. Xu et al. [1] developed a maximum likelihood approach for jointly mapping multiple binary traits, which is implemented via EM algorithm. They found that the QTL detecting power of joint analysis was higher than the sum of those of separate analysis. But after the QTL detecting power for separate analysis was redefined more reasonably by a combined power (see also [1]), the power of joint analysis was almost equal to the combined power, that is, joint analysis had almost the same power as separate analysis. For QTL parameter estimation, joint analysis can improve the precision of the QTL position estimates, but the QTL effects and their standard deviations have no obvious difference. Another class of approaches for multitrait analysis that use a dimension reduction technique was proposed by Korol et al. [4]. Mangin et al. [5] used this technique to analyze independent PCA (principal components analysis) trait, and used the PCA test values to detect QTL, which was proved to be asymptotically equivalent to the multivariate maximum-likelihood ratio test. However, the parameters of this kind of methods are often too difficult to interpret biologically. A maximum-likelihood method for multitrait mapping of QTL under outbred population was developed by Eaves et al. [6], which based on identity-by-descent (IBD) variance components model approach, and QTL effects were treated as random.

All the joint mapping approaches mentioned above were based on one-QTL model. Recently, Bayesian methodology has been used for mapping QTL [7-17], and the main advantage is that it can easily handle multiple QTL simultaneously. Currently, Bayesian reversible jump MCMC (RJMCMC) has become a usual method for mapping multiple QTL. Liu et al. [7] applied the method to multitrait mapping of QTL in outbred population under random effect model. However, because the dimension of RJMCMC is variable, it is always subject to poor mixing and hard to converge. Godsill [18] developed an effective Bayesian composite space method for model selection which keeps the model dimension fixed in each round of updating, and therefore it converges faster and is much easier to program. Yi et al. [15-17] successfully applied the novel approach to map QTL. In this article, we extend Bayesian composite space approach to multitrait analysis under inbred line crosses, and use both simulated data and real data to demonstrate the advantages and disadvantages of the proposed method.

Results

Simulation Study

We simulated 200 backcross individuals, and each has marker information and phenotypic records for three traits. One chromosome with length of 600 cM was investigated. Twenty-one markers were put on the genome with an average distance of 20 cM. Marker genotypes were observed for all the individuals. Thirteen QTL were added onto the genome, of which locus 96, 423, 487 and 584 had pleiotropic effects, and locus 250, 253 and 256, and locus 535 and 537 were closely linked and controlled different traits respectively. The positions and the effects of QTL for each trait are listed in Table 1. The population means for all traits were set to zero. The residual (co)variances are listed in Table 2. The heritability of each trait can be calculated as 0.728 for trait 1, 0.691 for trait 2 and 0.598 for trait 3.

Table 1. QTL Parameters and their estimates obtained from the simulated data

Table 2. The true values and their estimates of residual error (co)variance obtained from the simulated data

In order to investigate the performance of our approach, two methods were used to analyze the simulated data. The first method was the proposed multitrait analysis; the second is single-trait analysis. In single-trait analysis, we use the method 1 of [16], for the proposed method was a direct extension from it. In both multitrait analysis and single-trait analysis, the prior variance and degree of freedom of the residual error was set to zero, because no prior information was available. The prior expected number of QTL lk was 3 and the maximum number of QTL Lk equaled to the number of marker intervals (30). Therefore, the prior inclusion probability of the model indicator variable equaled to 0.1. For both methods, the MCMC ran for 1000 cycles as burn-in period (deleted) and then for additional 20,000 cycles after the burn-in. The chain was then thinned to reduce serial correlation by one observation saved every 10 cycles. The posterior sample contained 2000 (20, 000/10 = 2000) observations for the post-MCMC analysis.

The estimates of the QTL parameters for multitrait analysis and separate analysis are listed in Table 1 and Table 2. The results showed that there were no clear differences of the two methods in the estimates of the QTL positions, QTL effects and the corresponding standard deviation. Both methods can estimate QTL positions and effects, all closed to the true values.

Figure 1 and 2 respectively show the profiles of the posterior probability of the QTL positions and the 2logeBF statistic for multitrait analysis, and Figure 3 and 4 for separate analysis. From these figures, we found that both profiles of the posterior probability of QTL positions and the 2logeBF statistic for multitrait analysis are generally higher than those for separate analysis. Moreover, two additional QTL located at 483 and 245 were detected by multitrait analysis. These suggested that multitrait analysis may be more powerful than separate analysis.

thumbnailFigure 1. The profiles of the posterior probability for multitrait analysis using the simulated data. The profiles of the posterior probability obtained from multitrait analysis using the simulated data: (a) for trait 1; (b) for trait 2; (c) for trait 3. The true locations of the simulated QTL are indicated with an arrow (↑).

thumbnailFigure 2. The profiles of Bayes factors for multitrait analysis using the simulated data. The profiles of the Bayes factors (rescaled as 2logeBF and negative values are truncated as zero) obtained from multitrait analysis using the simulated data: (a) for trait 1; (b) for trait 2; (c) for trait 3. The true locations of the simulated QTL are indicated with an arrow (↑). The horizontal line indicates the critical value.

thumbnailFigure 3. The profiles of the posterior probability for single trait analysis using the simulated data. The profiles of the posterior probability obtained from separate analysis using the simulated data: (a) for trait 1; (b) for trait 2; (c) for trait 3. The true locations of the simulated QTL are indicated with an arrow (↑).

thumbnailFigure 4. The profiles of Bayes factors for single trait analysis using the simulated data. The profiles of Bayes factors (rescaled as 2 logeBF and negative values are truncated as zero) obtained from separate analysis using the simulated data: (a) for trait 1; (b) for trait 2; (c) for trait 3. The true locations of the simulated QTL are indicated with an arrow (↑). The critical value is given as horizontal line.

Real data analysis

We applied the new method to analyze the data from the North American Barley Genome Mapping Project [22]. The DH population included 150 lines (n = 150), each of which was genotyped for 223 codominant markers. These markers covered ~1500 cM of the genome along seven linkage groups with an average marker interval of ~7 cM. Eight traits, grain yield, lodging, height, heading data, grain protein, alpha amylase, diastatic power, and malt extract, were investigated in this project. Agronomic traits were measured in 16 areas, and malting quality traits in 9 areas. In our research, only three traits were studied, grain yield, height, and alpha amylase, and only the records in Crookston and Minnesota were used.

In the analysis, the prior expected number of QTL was taken as 3 for each trait, then the maximum number of QTL was calculated as Lk ≈ 3 + 3·<a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M1">View MathML</a> or Lk = 8. Therefore, the prior inclusion probability of the model indicator variable equals to 0.375. To reduce the model space, we assumed each chromosome contain at most one QTL, except that the 7th was divided into two parts at the middle point and each part contains one QTL, for the results of other analysis (IM, CIM) always show signals of two QTL on 7th chromosome for some traits. Also two methods, multitrait analysis and Bayesian single-trait analysis (method 1 in [16]), were used to analyze the real data. The MCMC ran for 5 × 104 cycles after the first 2000 was discarded. The chain was thinned by every 10 cycles one observation being saved, which yielded 5000 samples for posterior Bayesian analysis.

Figure 5 and Figure 6 show the profiles of 2logeBF statistic with real data by multitrait analysis and separate analysis. The profiles of Figure 5 are generally higher than that of Figure 6. For trait 1 (grain yield), no QTL was detected by separate analysis (Figure 6a), while eight QTL were detected by multitrait analysis (Figure 5a); for trait 2 (height), three QTL located on chromosomes 1, 2, and 7 were detected by separate analysis, however by multitrait analysis, not only much stronger signals of these three QTL, but also four additional QTL on chromosome 3, 4, 5 and 6 were detected; for trait 3 (alpha amylase), two additional QTL located on chromosome 1, 3 were detected by multitrait analysis. The results of real data analysis also supported the conclusion that multitrait analysis was more powerful than separate analysis.

thumbnailFigure 5. The profiles of Bayes factors for multitrait trait analysis using real data. The profiles of Bayes factors (rescaled as 2 logeBF and negative values are truncated as zero) obtained from multitrait analysis using the real data: (a) for trait 1; (b) for trait 2; (c) for trait 3. The dotted vertical lines on the horizontal axis separate the chromosomes. The critical value is given as horizontal line. On the x-axis, inner tick marks represent markers.

thumbnailFigure 6. The profiles of Bayes factors for single trait analysis using real data. The profiles of Bayes factors (rescaled as 2 logeBF and negative values are truncated as zero) obtained from separate analysis using the real data: (a) for trait 1; (b) for trait 2; (c) for trait 3. The dotted vertical lines on the horizontal axis separate the chromosomes. The critical value is given as horizontal line. On the x-axis, inner tick marks represent markers.

Discussion

The selection of hyper-parameter of the QTL effect is important in Bayesian analysis, which can influence the efficiency of the model selection. For example, with Bayesian shrinkage method [14], the hyper-parameter is a variable and assigned a special distribution so that no model selection is need. In Bayesian composite space approach, the updating of model indicator variables is closely dependent on QTL effects, but the selection of hyper-parameter is not much strict as Bayesian shrinkage analysis. Many approaches have been proposed for selection of hyper-parameter, and our method is only an extension of the approach of Yi et al. [15]. Moreover, we followed the approaches developed by Yi et al. [15] to obtain the prior probability for model indicator variables. However, we didn't investigate the influence of different prior probability on the results, because the proposed method is very computationally intensive. In addition, we suggested to use CIM-based multitrait analysis [2] to obtain the prior of variance-covariance of residual, but if prior information is not indeed known, we may take the noninformative prior [19], <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M2">View MathML</a>. In this simulation study, the noninformative prior is used and proved to be able to bring a precise estimate for variance-covariance of residual error.

The proposed multitrait analysis is based on Bayesian composite space approach, while other popular model selection approaches such as Bayesian shrinkage method [14] and Bayesian SSVS method [23] are also very easily extended, and the details will be demonstrated in another paper. We used BC and DH population as examples to demonstrate the efficiency of the method. The new method can be modified to be applied to other experiment designs, such as RIL, F2 design, etc. In addition, we only take the main effect into account, while the epistatic effect also can be included into the model. In that case, the model should be written as: <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M3">View MathML</a>, where q is main effect, q1 and q2 is two interacting QTL, and <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M4">View MathML</a> is (1 × m) column vectors of epistatic effect between QTL q1 and q2. Certainly, the implementation will be complicated and quite time-consuming, but nevertheless, the extension is feasible and expected to be very efficient for mapping interacting QTL.

In this paper, we have not given a test procedure to distinguish closely linked and pleiotropic QTL which cause the genetic correlations between each trait. There have been some of literatures about it, and generally, the likelihood ratio (LR) statistic [1,2] and Bayesian factor (BF) statistic [7] always have been used to solve the problem [7]. In our multitrait analysis, although the LR testing procedure in [2] is completely applicable, it is not optimal, because it is based on single-QTL model. Also Bayesian approach can be used for such testing, but the computing time is a big factor of concern. Hopefully, an efficient and fast approach will be developed that could solve the problem nicely.

Conclusion

Bayesian composite space approach [18] is an effective method for model selection. Yi [16] firstly used it for QTL mapping and proved it to be effective for mapping multiple QTL. In this article, we extended this novel statistical method to multitrait mapping of QTL. Compared with separate analysis, joint analysis is optimal, because the parameters are updated by vector or matrix and the correlation information between multiple traits can be made good use of. The powerful of the proposed multitrait method also be proved by both simulation experiments and real data analysis, and they all showed that the multitrait analysis tends to give higher statistical power than the single trait analysis.

Methods

Multivariate linear model

Consider n individuals derived from a backcross population crossed from two inbred lines with observations on some densely distributed codominant markers and on m quantitative traits. Supposed that the maximum number of QTL is p, the phenotypic value yki of individual i for kth trait can be described by the following multivariate linear model:

<a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M5','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M5">View MathML</a>

(1)

for i = 1, 2, ..., n and k = 1, 2, ..., m, where γkj is model indicator variable, indicating the jth QTL of kth trait included (1) or excluded (0) from the model; bk0 is population mean; bkj is QTL effect; xkij is QTL genotype, if QTL genotype is homozygote xkij = 1, otherwise -1; eki is residual error and assumed to follow multivariate normal distribution. If we denote equation (1) by matrix, it can be expressed as:

<a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M6','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M6">View MathML</a>

(2)

for i = 1, 2, ..., n, where yi = [y1i, y2i, ..., ymi]T, b0 = [b10, b20, ..., bm0]T, bj = [b1j, b2j, ..., bmj]T, ei = [e1i, e2i, ..., emi]T. They are all (1 × m) column vectors. Equation (3) is QTL genotype matrix and Equation (4) is model indicator matrix, they are all (m × m) diagonal matrix.

<a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M7">View MathML</a>

(3)

<a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M8','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M8">View MathML</a>

(4)

Prior specification

The prior distribution of each QTL effect vector bj is multivariate normal distribution, p(bj) ~ N(0, <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M9','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M9">View MathML</a>), where <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M9','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M9">View MathML</a> is the hyper-parameter, and We take <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M10','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M10">View MathML</a>, which is simply an extension from Bayesian single trait analysis [15]. The importance of the choice of the hyper-parameter will be discussed later. In a large backcross population and under the definition of xmij (-1 or 1), <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M9','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M9">View MathML</a> can be simplified as <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M9','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M9">View MathML</a> = Σe. The prior of the covariance matrix of residual error follows Inverse Wishart distribution, Σe ~ Wishart-1(ve, <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M11','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M11">View MathML</a>), where, ve and <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M11','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M11">View MathML</a> are prior degree of freedom and covariance matrix of residual error, respectively, and can be obtained from other method, such as CIM based multitrait analysis [2], etc. The prior distribution of population mean b0 is normal distribution with mean and variance equal to those calculated by phenotypic values. The prior probability distribution of QTL position λkj is uniform distribution with bounds of two flanking markers, p(λkj) = 1/dj, where dj is length of the interval where jth QTL is confined. Assuming that epistatic effect is absent, the prior inclusion probability for jth effect can be expressed as p(γkj = 1) = 1 - lk/Lk]1/N (see also [15]), where lk is the prior expected number of main-effect QTL, and could be roughly estimated with the use of standard genome scans; N is the number of possible main effects for each QTL and equal to 1 in BC family [15]; Lk is the upper bound of QTL number, and equals to the number of marker interval in our simulation study, while in another approach suggested by Yi [15]Lk is taken as 3 + 3·<a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M1">View MathML</a>, which causes the model space to reduce dramatically [15].

Joint posterior density

The observable variables include phenotypic values, <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M12">View MathML</a> and marker information, <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M13','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M13">View MathML</a>. The unobservable variables include population mean, <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M14','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M14">View MathML</a>; QTL effects, <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M15">View MathML</a>; QTL genotypes, <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M16','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M16">View MathML</a>; model indicator variables, <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M17','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M17">View MathML</a>; (co)variance of residual error, Σe, and QTL positions, <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M18','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M18">View MathML</a>. Let θ be the vector of hyper-parameters, Θ = {b0, b, Σe, λ, X, Φ}, then the joint prior density of the unobservable variables is denoted by p(Θ|θ). The joint posterior probability of Θ, given the observable variables y and m, can be expressed as:

p(Θ|y, m) ∝ p(Θ|θp(y, m|Θ),(5)

where, p(y, m|Θ) is the likelihood and can be written as:

p(y, m|Θ) = p(y|Θp(m|Θ),(6)

where p(y|Θ) is multivariate normal density, and p(m|Θ) can be derived from a Markov model [14].

MCMC sampling

MCMC algorithm generates samples from Markov chains which converge to the posterior distribution of parameters, without the constant of proportionality being calculated. From these posterior samples, summary statistic of the posterior distribution can be calculated. MCMC algorithm proceeds as follows:

a. Initialize all parameters with values in their legal domain.

b. Update the population mean b0.

c. Update the QTL effects vectors <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M19','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M19">View MathML</a>.

d. Update the variance-covariance matrix Σe of the residual error.

e. Update the QTL genotype indicator matrices <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M20','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M20">View MathML</a> and the QTL location vectors <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M21','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M21">View MathML</a> jointly, for j = 1, 2,..., p.

f. Update the model indicator variable matrices <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M22','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M22">View MathML</a>.

The conditional posterior distribution of the population mean b0 is multivariate normal with mean

<a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M23','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M23">View MathML</a>

(7)

and variance-covariance matrix

<a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M24','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M24">View MathML</a>

(8)

The conditional posterior distribution of the QTL effect bj is sampled from multivariate normal distribution with mean

<a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M25','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M25">View MathML</a>

(9)

and variance-covariance matrix

<a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M26','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M26">View MathML</a>

(10)

The posterior distribution of the residual error follows inverted Wishart distribution,

<a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M27','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M27">View MathML</a>

(11)

where <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M28','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M28">View MathML</a> and dfe = n.

In step e, the QTL locations and QTL genotype matrices are updated jointly. For locus j, we can firstly sample a new QTL position for each trait from their prior distribution (described later), then sample the QTL genotype matrices <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M20','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M20">View MathML</a> on the new position using equation (15), and finally, they are updated by the efficient Metropolis-Hastings algorithm [20,21]. Because the sampling of Xij is too complicate and we are going to firstly describe it. Due to the QTL genotype xkij has two possible values (-1 or 1) in BC line, if m traits are investigated jointly, Xij has 2m kinds of possible formations, and the general pattern of Xij can be written as:

<a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M29','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M29">View MathML</a>

(12)

where, z1, z2, ..., zm ∈ {-1, 1}. For clarity, we omit the subscript ij from <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M30','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M30">View MathML</a> and present formulas <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M31','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M31">View MathML</a> to denote the genotype matrix of ith individual and jth loci. Because the QTL genotypes xkij of ith individual in the jth interval for all traits may be correlated, the joint prior probability of the genotype matrix Xij can't be simply expressed by the following equation:

<a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M32','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M32">View MathML</a>

(13)

Instead, it can be derived from the Markov model (see Equation 14), assuming that the order of markers and QTL is MjQ1Q2 ... QmMj+1 (see Figure 7), where, Q1, Q2, ..., and Qm denote the QTL respectively affecting trait 1, trait 2, ..., and trait m in jth marker interval. Indicator variables x1ij, x2ij, ..., and xmij denote the genotypes of these QTL.

thumbnailFigure 7. The positions of markers and QTL and their sequence ranged on a certain marker interval.

<a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M33','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M33">View MathML</a>

(14)

If no segregation interference is considered, the joint prior probability can be factorized into equation (14), and each term in equation (14) can be derived from Haldane map function. Only the first term in equation (14) is conditional on two flanking markers; others are not only conditional on two flanking markers but also on the genotypes of all the QTL prior to the interested one. If double recombination is ignored [2], each term in equation (14) can be inferred only by the genotype of the left nearest loci (marker or QTL) and the right marker, then equation (14) can be simplified as:

<a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M34','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M34">View MathML</a>

(15)

Each term in equation (15) can be easily inferred.

It is worth mentioning that we assume the sequence of markers and QTL is MjQ1Q2 ... QmMj+1, and in fact, the sequence of QTL may be variable in each round of updating. Therefore, we should firstly ascertain the sequence in each round, and then construct the appropriate formula to calculate the joint prior probability of the QTL genotype p(Xij = <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M31','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M31">View MathML</a>|mi,j,λj,mi,j+1) according above rules. For clarity, we take an example to demonstrate it. Consider 3 QTL Q1, Q2, and Q3 that affect 3 traits respectively in an interval. Assuming that in a certain round the sequence of markers and QTL is MjQ3Q1Q2Mj+1, then the formula for calculating the joint prior probability of the QTL genotype can be written as:

<a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M35','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M35">View MathML</a>

Once we obtain the joint prior probability of the QTL genotype, the joint conditional posterior probability of Xij can be expressed as:

<a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M36','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M36">View MathML</a>

(16)

where <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M37','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M37">View MathML</a> is likelihood, and follows multivariable normal distribution,

<a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M38','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M38">View MathML</a>

(17)

Once we have calculated 2m possible posterior probabilities for the corresponding QTL genotype matrices, we are going to sample one genotype matrix according to their posterior probabilities. We firstly constructed the cumulative probability function F(d) by accumulating the 2m probabilities in an arbitrary sequence for d = 1, 2, ..., 2m and F(0) = 0, which is a discrete distribution; then sampled a random number from uniform distribution, u ~ U[0,1]; and compared u with F(d), if F(d - 1) <u F(d), then the dth genotype matrix is accepted.

The new sampled QTL genotype matrices <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M20','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M20">View MathML</a> are only the proposal value, which should be updated along with the proposal QTL position vector λj = [λ1j, λ2j, ..., λmj] by the Metropolis-Hastings algorithm [20,21]. For each trait, the new proposal position is sampled around the existing one from uniform distributions, <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M39','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M39">View MathML</a> ~ [λkj - δ, λkj + δ), where δ is tuning parameter, usually taking a value of 1 or 2 cM. The new position vector is denoted by <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M40','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M40">View MathML</a>; then the new QTL genotype matrix <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M41','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M41">View MathML</a> is sampled conditionally on the new position using equation (16); finally, the position vector <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M42','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M42">View MathML</a> and genotype matrices <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M20','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M20">View MathML</a> are accepted jointly with probability equal to min(1,α), where

<a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M43','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M43">View MathML</a>

(18)

p(<a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M42','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M42">View MathML</a>) and p(λj) is the prior probability of new and old position respectively, and they are cancelled out under uniform prior distribution; <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M44','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M44">View MathML</a> and p(Xij|λj, ...) is the prior probability of QTL genotype conditional on new and old position, which has been described detailed previously; <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M45','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M45">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M46','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M46">View MathML</a>, are all proposal ratio.

In step f, block sampling of the indicator variable matrix Φj is expected to have a better performance than separately updating each γkj in Φj. Due to there are two possible values (0 or 1) for each model indicator γkj, if m traits are investigated jointly, each model indicator matrix Φj has 2m kinds of formations. The general formula of it can be written as:

<a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M47','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M47">View MathML</a>

(19)

where, wk ∈ {0,1}, for k = 1, 2, ..., m. Because the prior probability of each γkj is independent, the joint prior probability for all possible formations can be written as <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M48','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M48">View MathML</a>. Then the conditional posterior probability of Φj can be written as

<a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M49','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M49">View MathML</a>

(20)

The approach to sample Φj is similar to QTL genotypes sampling previously mentioned.

Post-MCMC analysis

For summarizing the posterior sample, we use the mean of the posterior sample to estimate the QTL effect and the residual (co)variance, and the mode of the posterior probability or the peak of the 2logeBF statistic to localize QTL. 2logeBF statistic was introduced by Yi et al.[17] into QTL mapping, and BF statistic is defined as the ratio of the posterior odds to the prior odds for inclusion against exclusion of the locus [24]. The critical value of BF is 3 or 2logeBF = 2.1 for declaring the existence of a QTL.

In single-trait analysis, we can pick the QTL by plotting the profile of the posterior probability or 2logeBF statistic against the genome. In multitrait analysis, if only two traits are considered jointly, we can use a three-dimension graph to summarize the statistic for all traits jointly (e.g., Figure 2 in [19]). However, if the number of trait is greater than 2, we can't plot them in one graph. Instead, we can solve the problem by plotting the marginal posterior probability distribution. If we divide the genome into H bins, and denote each bin of kth trait with ζkg, for g = 1,2, ..., H, then the marginal posterior probability distribution of ζkg is defined as p(ζkg|y) = p[(ζkg = λkq) ∩ (γkq = 1)], where, q indicates the qth interval that locus ζkg resides in. Then <a onClick="popup('http://www.biomedcentral.com/1471-2156/9/48/mathml/M50','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2156/9/48/mathml/M50">View MathML</a>, which can be calculated at each possible locus for each trait, respectively.

Authors' contributions

MF coordinated the study, developed the foundational principle of the method and wrote the computing program and the paper. Others were responsible for the simulation experiment, carried out the analysis of results and helped to consummate the whole paper.

Acknowledgements

We deeply thank four anonymous reviewers for their criticisms and comments which have greatly improved the presentation of the manuscript. This work was partly supported by Heilongjiang August First Land Reclamation University.

References

  1. Xu CW, Li ZK, Xu S: Joint mapping of quantitative trait loci for multiple binary characters.

    Genetics 2005, 169:1045-1059. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  2. Jiang C, Zeng ZB: Multiple trait analysis of genetic mapping for quantitative trait loci.

    Genetics 1995, 140:1111-1127. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  3. Knott SA, Haley CS: Multitrait least squares for quantitative trait loci detection.

    Genetics 2000, 156:899-911. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  4. Korol AB, Ronin YT, Itskovich AM, Peng J, Nevo E: Enhanced efficiency of quantitative trait loci mapping analysis based on multivariate complexs of quantitative traits.

    Genetics 2001, 157:1789-1803. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  5. Mangin B, Thoquet P, Grimslev N: Pleiotropic QTL analysis.

    Biometrics 1998, 54:88-99. Publisher Full Text OpenURL

  6. Eaves LJ, Neale MC, Maes H: Multivariate multipoint linkage analysis of quantitative trait loci.

    Behav Genet 1996, 26:519-525. PubMed Abstract | Publisher Full Text OpenURL

  7. Liu JF, Liu YJ, Liu XG, Deng H-W: Bayesian mapping of quantitative trait loci for multiple complex traits with the use of variance components.

    Am J Hum Genet 2007, 81:304-320. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  8. Satagopan JM, Yandell BS, Newton MA, Osborn TC: A Bayesian approach to detect quantitative trait loci using Markov chain Monte Carlo.

    Genetics 1996, 144:805-816. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  9. Yi N, Xu S: Bayesian mapping of quantitative trait loci for complex binary traits.

    Genetics 2000, 155:1391-1403. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  10. Yi N, George V, Allison DB: Stochastic search variable selection for identifying multiple quantitative trait loci.

    Genetics 2003, 164:1129-1138. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  11. Yi N, Xu S, Allison DB: Bayesian model choice and search strategies for mapping multiple epistatic quantitative trait loci.

    Genetics 2003, 165:867-883. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  12. Yi N, Xu S, Allison DB: Bayesian model choice and search strategies for mapping interacting quantitative trait loci.

    Genetics 2003, 165:867-883. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  13. Xu S: Derivation of the shrinkage estimates of quantitative trait locus effects.

    Genetics 2007, 177:1255-1258. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  14. Wang H, Zhang YM, Li XM, Masinde GL, Mohan S, Baylink DJ, Xu S: Bayesian shrinkage estimation of quantitative trait loci parameters.

    Genetics 2005, 170:465-480. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  15. Yi N, Yandell BS, Churchill GA, Allison DB, Eisen EJ, Pomp D: Bayesian model selection for genome-wide epistatic quantitative trait loci analysis.

    Genetics 2005, 170:1333-1344. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  16. Yi N: A unified Markov chain Monte Carlo framework for mapping multiple quantitative trait loci.

    Genetics 2004, 167:967-975. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  17. Yi N, Shriner D, Banerjee S, Mehta T, Pomp D, Yandell BS: An efficient Bayesian model selection approach for interacting quantitative trait loci models with Many Effects.

    Genetics 2007, 176:1865-1877. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  18. Godsill SJ: On the relationship between MCMC model uncertainty methods.

    J Comput Graph Stat 2001, 10:230-248. Publisher Full Text OpenURL

  19. Gelman A, Carlin J, Stern H, Rubin D: Bayesian Data Analysis. London, Chapman & Hall; 2004. OpenURL

  20. Hastings WK: Monte Carlo sampling methods using markov chains and their applications.

    Biometrika 1970, 57:97-109. Publisher Full Text OpenURL

  21. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E: Equations of state calculations by fast computing machines.

    J Chem Phys 1953, 21:1087-1091. Publisher Full Text OpenURL

  22. Tinker NA, Mather DE, Rossnagel BG, Kasha KJ, Kleinhofs A, Hayes PM, Falk DE, Ferguson T, Shugar LP, Legge WG, Irvine RB, Choo TM, Briggs KG, Ullrich SE, Franckowiak JD, Blake TK, Graf RJ, Dofing SM, Saghai Maroof MA, Scoles GJ, Hoffman D, Dahleen LS, Kilian A, Chen F, Biyashev RM, Kudrna DA, Steffenson BJ: Regions of the genome that affect agronomic performance in two-row barley.

    Crop Sci 1996, 36:1053-1062. OpenURL

  23. Yi N, George V, Allison DB: Stochastic search variable selection for identifying multiple quantitative trait loci.

    Genetics 2003, 164:1129-1138. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  24. Kass RE, Raftery AE: Bayes factors.

    J Am Stat Assoc 1995, 90:773-795. Publisher Full Text OpenURL