Abstract
Background
Genomic selection is an effective tool for animal and plant breeding, allowing effective individual selection without phenotypic records through the prediction of genomic breeding value (GBV). To date, genomic selection has focused on a single trait. However, actual breeding often targets multiple correlated traits, and, therefore, joint analysis taking into consideration the correlation between traits, which might result in more accurate GBV prediction than analyzing each trait separately, is suitable for multitrait genomic selection. This would require an extension of the prediction model for singletrait GBV to multitrait case. As the computational burden of multitrait analysis is even higher than that of singletrait analysis, an effective computational method for constructing a multitrait prediction model is also needed.
Results
We described a Bayesian regression model incorporating variable selection for jointly predicting GBVs of multiple traits and devised both an MCMC iteration and variational approximation for Bayesian estimation of parameters in this multitrait model. The proposed Bayesian procedures with MCMC iteration and variational approximation were referred to as MCBayes and varBayes, respectively. Using simulated datasets of SNP genotypes and phenotypes for three traits with high and low heritabilities, we compared the accuracy in predicting GBVs between multitrait and singletrait analyses as well as between MCBayes and varBayes. The results showed that, compared to singletrait analysis, multitrait analysis enabled much more accurate GBV prediction for lowheritability traits correlated with highheritability traits, by utilizing the correlation structure between traits, while the prediction accuracy for uncorrelated lowheritability traits was comparable or less with multitrait analysis in comparison with singletrait analysis depending on the setting for prior probability that a SNP has zero effect. Although the prediction accuracy with varBayes was generally lower than with MCBayes, the loss in accuracy was slight. The computational time was greatly reduced with varBayes.
Conclusions
In genomic selection for multiple correlated traits, multitrait analysis was more beneficial than singletrait analysis and varBayes was much advantageous over MCBayes in computational time, which would outweigh the loss of prediction accuracy caused by the approximation procedure, and is thus considered a practical method of choice.
Keywords:
Genomic selection; Multiple traits; Bayesian regression; MCMC iteration; Variational approximationBackground
A huge number of genomewide polymorphisms have recently been elucidated in livestock and crops with the development of sequencing technologies. Highthroughput genotyping systems, such as highdensity SNP chips containing several tens or hundreds of thousands of genomewide SNP markers and GBS (genotyping by sequence) [1], have become available to efficiently identify genotypes of individuals for a large number of SNPs at low cost. Consequently, genomic selection [2] is attracting attention as a new breeding technology utilizing the information of genomewide dense SNP markers. In genomic selection, a model to predict genomic breeding value (GBV) based on genomewide SNP genotype is firstly constructed using a training population consisting of individuals with both SNP genotypes and phenotypes of a trait and, subsequently, using this model, the GBVs of a trait are predicted for individuals in the tested population, which are selection candidates, based on their SNP genotypes, allowing effective individual selection to be performed without phenotypic records using the predicted GBV. Therefore, genomic selection requires the construction of prediction models being able to accurately relate genotypes of genomewide SNPs to GBV.
In the original study of genomic selection by Meuwissen et al. [2], two Bayesian methods were presented for construction of prediction models, which were referred to as BaeysA and BayesB and have been used for the studies of genomic selection in their original or modified forms [3,4]. The model of BayesA is equivalent to the Bayesian shrinkage regression (BSR) model [5], where all SNPs are included in the prediction model as covariates and the estimates of SNP effects are shrunk by assuming a normal distribution with mean 0 and SNPspecific variance as prior distributions of SNP effects, resulting in negligible estimates for the effects of many SNPs irrelevant to phenotype, but significantly large estimates for the effects of SNPs contributing to phenotype. On the other hand, the BayesB method is regarded as a variant of Bayesian stochastic search variable selection (BSSVS) [6], where the prior probability, π, that a SNP has zero effect and is removed from the model was considered in the model fitting to obtain the best model explaining the phenotypes with a small number of SNP effects. A normal distribution with mean 0 and SNPspecific variance is assumed for the prior distribution of SNP effects in BayesB, as in BayesA, if SNPs are included in the model with probability 1π and, otherwise, both the mean and variance are fixed at 0 with probability π.
In BayesA and BayesB, an additional hierarchical structure is induced for this SNPspecific variance, where an inverted chisquare distribution with degree of freedom ν and scale parameter S is adopted as the prior distribution of the variance. However, only the information of the relevant single SNP can be used for the posterior inference of this SNPspecific variance regardless of the number of genotypes or phenotypes. Due to the insufficient Bayesian learning for the variance, the degree of shrinkage for the estimates of SNP effects is largely influenced by the prior setting for ν and S in BayesA and BayesB [7]. In BaeysB, the sparseness of the model and the prediction accuracy are also greatly affected by the prior probability, π, that a SNP has zero effect, which is treated as a known fixed value. To overcome these drawbacks of BayesA and BayesB, the modified methods such as BayesC and BayesD were proposed. In BayesC, a single variance common to all SNPs is adopted for the prior distribution of SNP effects while BayesD allows S to be inferred from the data, given ν. Furthermore, the step for inferring π, which is given as a fixed value in BayesB, can be incorporated in the procedure of BayesC and BayesD and the corresponding methods are termed BayesCπ and BayesDπ, respectively [4].
These Bayesian methods were mainly developed for genomic selection of a single trait. However, actual breeding of animals and plants often aims to simultaneously improve multiple correlated traits. Therefore, joint prediction of GBVs for multiple traits, taking into consideration the correlation structure between traits, is suitable for multitrait genomic selection, which requires the extension of existing methods for singletrait GBV prediction to multitrait case. In QTL mapping methods that use similar models to those of genomic selection, some researchers have developed multitrait models [810]. Xu et al. [8] extended the BSR model to a multitrait case introducing the correlation structure between traits in estimation of QTL effects and other nongenetic effects. Banerjee et al. [9] employed a Bayesian composite model space approach [11,12] for multitrait QTL mapping, where a variable indicating inclusion or exclusion of each QTL was incorporated for constructing a model with the smallest possible number of QTLs, a similar approach to BSSVS adopted in BayesB, and each trait was allowed to have a different model by assuming the traitspecific effects for QTLs and nongenetic factors that were assumed to be uncorrelated between traits. Meuwissen and Goddard [10] proposed a multitrait BSSVS model for QTL mapping. These methods can be applied for prediction of multitrait GBVs in genomic selection. Calus and Veerkamp [13] applied the method proposed in [10] with some modification for multitrait GBV prediction to compare the accuracy between singletrait prediction and multitrait prediction in genomic selection.
The computational procedure with MCMC iteration is generally used for the Bayesian methods to estimate parameters in the models, which become complicated models in genomic selection, including a huge number of SNPs as covariates and SNP effects that are estimated as their regression coefficients. The computational burden of MCMCbased Bayesian methods, which requires a long time until convergence when estimating many parameters is huge even in singletrait GBV prediction and would be further increased in the case of multiple traits, thus hindering the MCMC procedure depending on the number of traits to be jointly analyzed. Therefore, it would be necessary to devise a solution for reducing the computational burden of Bayesian methods for multitrait GBV prediction. So far, some nonMCMC computational procedures for Bayesian methods have been proposed in QTL mapping and genomewide association study, including EMalgorithm [14] and variational approximation [15,16]. The EMalgorithm was also applied to singletrait GBV prediction, successfully reducing the computational time [17,18].
In this paper, we propose Bayesian methods for multitrait GBV prediction, in which BSR models allowing variable selection are developed and MCMC procedures for estimating model parameters including SNP effects are described as well as a computationally costeffective nonMCMC method using variational approximation as an alternative computational procedure. Hereafter, the Bayesian methods based on MCMC iteration and variational approximation are referred to as MCBayes and varBayes, respectively. The multitrait Bayesian models described here include a Bayesian shrinkage regression (BSR) models that are equivalent to those adopted by BayesA and BayesD when variable selection is not conducted, and the models of BSSVS that are regarded as slightly modified versions of BayesB and BayesDπ methods when a step of variable selection step is incorporated. We develop computationallyeffective variational approximation procedures to construct the posterior distributions of parameters of these Bayesian models.
Using simulated datasets consisting of genotypes of genomewide dense SNPs and phenotypes of three correlated traits with high and low heritabilities, we investigated the differences in prediction accuracy for each trait between multitrait analysis, where GBVs of three traits were simultaneously predicted taking the correlation structure between traits into consideration, and singletrait analysis, where each trait was separately predicted for GBV. We also evaluated the prediction accuracy of the varBayes methods in comparison with MCBayes. Moreover, we investigated the performance of multitrait analysis in simulated data including missing phenotypes.
Methods
In this section, we describe Bayesian models for multitrait GBV prediction and computational procedures for Bayesian estimation of the model parameters, including construction of posterior distributions of the parameters using MCMC iteration and variational approximation. Here, we consider the statistical models for BSR and BSSVS, which are shown to be equivalent to BayesD and similar to BayesDπ, respectively. In these Bayesian models concerned, BSR is regarded as a special version of BSSVS by setting π =0, where π is a prior probability that a SNP does not contribute to traits; thus, we present the multitrait GBV prediction models in a unified fashion in terms of BSSVS.
We assume that the number of SNPs genotyped is N and a training dataset including n individuals with phenotypic records of multiple traits, where the number of traits is denoted by T, and SNP genotypes is available for estimating parameters in the prediction model. We also assume that a test dataset consists of individuals with only SNP genotypes, for each of which GBV is predicted. We denote two alleles at each SNP by 0 and 1 and three genotypes by ‘0_0’, ‘0_1’, and ‘1_1’.
Models for Bayesian stochastic search variable selection in multitrait genomic selection
We propose the following Bayesian multilocus linear model for the phenotypes of T traits of the ith individual, which are denoted as a Tx1 vector y_{i} (i = 1,2,…,n), as a BSSVS model for multitrait GBV prediction:
where b is a f × 1 vector of nongenetic effects including interceptions of the model with X_{i} being a T × f design matrix linking b to the ith individual; u_{il} is a variable indicating the genotype of the ith individual at the lth SNP taking a value of −1, 0 or 1 corresponding to the genotypes, ‘0_0’, ‘0_1’, or ‘1_1’, respectively; g_{l} is a T × 1 vector of the effects of the lth SNP on the phenotypes of T traits; γ_{l} is a variable indicating the inclusion of the lth SNP in the model with 1 or 0 depending on whether or not the lth SNP is included in the model; and e_{i} is a T × 1 vector of residual errors following a Tvariate normal distribution N(0, Σ_{e}) with 0 being a T × 1 vector of zeros and Σ_{e} being a TxT covariance matrix of e_{i}.
Within the Bayesian framework, prior distributions are assigned to the parameters of the model (1). We assume that the priors of the elements of b are the improper uniform distribution over the possible values. The prior probabilities that γ_{l} =1 and γ_{l} =0 are presented as 1 π and π, respectively, considering the prior probability that a SNP does not contribute to the trait and is excluded from the model is π. The prior distribution of g_{l} given γ_{l} has the form
where Σ_{gl} is a T × T matrix that is the variance and covariance matrix of g_{l} given γ_{l} =1 and δ (0) is the delta function that concentrates a total mass at zero for all T elements of g_{l}. We induce a hierarchical structure for Σ_{gl} by assigning an inverse Wishart distribution with degree of freedom ν and scale parameter S, denoted by IW_{T}(ν,S), to the prior distribution of Σ_{gl} :
Although the SNP effect g_{l} is zero and irrelevant to Σ_{gl} when γ_{l} =0 as shown in (2), we assume that Σ_{gl} is a priori distributed as given in (3) regardless of γ_{l}. We treat ν as a given fixed value but infer S from the data within the Bayesian framework. For simplicity, we assume here that S is a diagonal matrix with the kth diagonal element being s_{k} , that is S = diag(s_{k}) (k = 1,2,…,T), and we adopt the improper uniform distribution over positive values for a prior distribution of s_{k}. The residual variance and covariance matrix Σ_{e} is assumed to have a uniform distribution over the positive definite matrices as a prior distribution.
Denoting these parameters of the Bayesian model collectively by θ, the prior distribution of θ by p(θ) and observed phenotypes y_{i} and SNP genotypes u_{il} (i = 1,2,…,n;l = 1,2,…,N) by Y and U, respectively, we can write the posterior distribution of θ, g(θν,Y, U), as
from (1), (2) and (3), where y_{i}* is a residual given by
Given that S is fixed, posterior distribution (4) is equivalent to that of BayesA extended to multitrait case when π = 0 leading to γ_{l} =1 for all SNPs. When π > 0, the selection of SNPs to be included in the model is performed as in BayesB. However, it should be noted that posterior distribution (4) is not equivalent to that of the multitrait version of BayesB, which supposes that Σ_{gl} is a zero matrix as well as g_{l} when γ_{l} =0, while, in (4), the prior distribution of Σ_{gl} is assumed to be IW_{T}(ν, S) regardless of γ_{l}. The form of posterior distribution (4) allows Gibbs sampling in the MCMC estimation for all parameters including g_{l} and Σ_{gl}. The full conditional posterior distributions of parameters used in MCBayes are described in Additional file 1.
Additional file 1. Derivation of full conditional posterior distributions of parameters in a statistical model.
Format: PDF Size: 47KB Download file
This file can be viewed with: Adobe Acrobat Reader
Variational approximation procedure for multitrait Bayesian model
We adopted variational approximation as an alternative to MCMC iteration for constructing marginal posterior distributions of parameters based on joint posterior distribution (4). In the variational approximation procedure, the joint posterior distribution is approximated by the product of functions for subsets of parameters with lower dimension. Briefly, we assume that g(θν, Y, U) is approximated by a function of θ, q(θ), which is factorized as q(θ) = q_{1}(θ_{1}) q_{2}(θ_{2})…q_{K}(θ_{K}), where θ_{1}, θ_{2},… ,θ_{K} are mutually exclusive subsets of θ such that ∪ _{k = 1}^{K}θ_{k} = θ and q_{k} (θ_{k}) (k = 1,2,…,K) may generally depend on νY and U although this dependence is omitted here for simplicity. This approximating marginal posterior distribution, q_{k}(θ_{k}), is referred to as the variational posterior of θ_{k}[19]. The form of q_{k}(θ_{k}) is determined such that the Kullback–Leibler divergence between g(θνY, U) and q(θq_{1}(θ_{1}) q_{2}(θ_{2})…q_{K}(θ_{K}), D(qg), is minimized [20], where D(qg) is defined as
It can be shown that q_{k}(θ_{k}) is expressed as
where C is a normalized constant and E_{}_{θ}_{k}[.] indicates the expectations of parameters other than θ_{k} that are calculated with respect to every other parameter’s variational posteriors except q_{k}(θ_{k}) [21].
In the varBayes method that applies the variational approximation procedure to the multitrait Bayesian model considered here, g(θν,Y, U) is assumed to be approximated by a factorized function q(θ) that is written as
where we denote all of the variational posteriors for the different parameters in the righthand side by q(.) for simplicity with the understanding that q(b), q (Σ_{e}) and so on take different forms depending on the parameters. The forms of these variational posteriors are derived from (5) in a manner similar to that used for derivation of the full conditional posteriors for the parameters in Gibbs sampling and the computational details are given in Additional file 2. In the following, we will give the variational posteriors for parameters, γ_{l}, g_{l}, Σ_{gl} (l = 1,2,…,N), b, Σ_{e} and S.
Additional file 2. Derivation of variational posteriors for parameters in a statistical model.
Format: PDF Size: 66KB Download file
This file can be viewed with: Adobe Acrobat Reader
The variational posterior for b, q(b),is a multivariate normal density with mean
and variancecovariance matrix
where E(.) indicates the expectation calculated with respect to the variational posteriors of relevant parameters, while q (Σ_{e}) is IW_{T} (nT1, S_{e}) with
from which we obtain
The variational posterior of Σ_{gl} is represented by IW_{T} (ν_{gl}, S_{gl}), where
and
The expectation of Σ_{gl}^{1} with respect to this posterior distribution is
For γ_{l} and g_{l}, we consider joint distribution for variational posterior q(γ_{l}, g_{l}) that is expressed as
where is a density function of a multivariate normal distribution with mean
and variancecovariance matrix
The marginal posterior distribution of γ_{l} is a binomial distribution with probabilities of γ_{l} =1 and 0 given by
and
The conditional distribution of g_{l} given γ_{l} is given by
Therefore, we obtain
and
From (4), the variational posterior of S is a Tvariate Wishart distribution with a scale matrix Σ_{S} and degree of freedom ν_{s}, W_{T}(ν_{s}, Σ_{S}), where and ν_{s} = Nν + T + 1, and the expectation of S is expressed as
As outlined above, a wellknown probability distribution, such as normal, inverse Wishart and so on, is assigned to the variational posterior of each parameter, which is characterized by the expectations of the functions of other parameters, taken with respect to their variational posteriors. The relationships between these expectations are given by (6), (8), (9), (13), (14), (15) and (16), from which the expectations can be calculated with numerical iterations to obtain the variational posteriors.
Moreover, the prior probability for a SNP to have zero effect, π, can be treated as a variable parameter to be inferred from the data as in BayesCπ and BayesDπ when 0 < π < 1. Here, we assume a uniform prior on 0 < π < 1 to obtain a Beta distribution, B(a,b), as a variational posterior from (4), where and , with the expectation
Accordingly, the variational posteriors of the other parameters are modified with π substituted by E(π) when π is involved in the Bayesian inference.
Treatment of missing phenotypes
In the phenotypic records for multiple traits, it is common for trait values to be partially missing in some individuals. Missing phenotypes of a trait in individuals can be inferred with the observed phenotypes of other traits in the same individuals. The step for inferring missing phenotypes can be implemented in the MCBayes and varBayes procedures as described below.
When there are missing phenotypes in an individual, the residual vector of the individual, e in model (1), is partitioned into components e_{o} and e_{m} corresponding to observed and missing phenotypes, respectively. Following [22], e_{m} can be sampled from the following normal distribution,
where Σ_{mm}, Σ_{mo} and Σ_{oo} indicate the partition of the residual variancecovariance matrix Σ_{e} corresponding to e_{m} and e_{o}. Accordingly, e_{m} is drawn with Gibbs sampling in MCBayes while it is obtained as E(Σ_{mo})E(Σ_{oo})^{1}e_{o}* in varBayes, where, denoting the component corresponding to the missing traits by subscript ‘o’, e_{o}* is written as
and the expectations can be calculated with the variational posteriors of Σ_{e}, b and g_{l} . Missing phenotypes are inferred as the sum of the estimates of b, g_{l} and e_{m}, which are used for the construction of the prediction model.
Simulation experiments
We simulated datasets to evaluate the accuracy of the predicted GBVs using the proposed Bayesian methods, MCBayes and varBayes, for multiple traits. In generating the datasets, three traits, denoted as A, B and C, were considered. The simulation of population and genome was carried out following [3] where a single trait was treated, while multiple traits were generated through a modified approach. The simulated population, genotypes and phenotypes are described in the following.
Populations with an effective population size of 100 were maintained by random mating for 1000 generations to attain mutation drift balance and LD between SNPs and QTLs. In generation 1001 and 1002, the population size was increased to 1000. The population in the 1001st generation was treated as a training population, where the phenotypes of three traits and SNP genotypes of the individuals were simulated and analyzed to estimate the SNP effects in the model. The phenotype of each trait for each individual in the 1001st generation was given as the sum of QTL effects over the polymorphic QTLs and environmental effects, which were sampled as described later. For simplicity, no other fixed effects were assumed. The population in the 1002nd generation was used as a test population, where the individuals were only genotyped for SNP markers without phenotypic records and GBVs of three traits were predicted for each individual using a model with SNP effects estimated based on the training population in the 1001st generation. The true breeding value (TBV) of the individual in the 1002nd generation was also simulated as the sum of QTL effects corresponding to the QTL genotype for each trait and used for evaluating the accuracy of predicted GBV, but was regarded as unknown and unavailable in the estimation of SNP effects in the models. Accuracy was measured based on the correlation between the TBV and predicted GBV, r_{TBV,pGBV}, for each trait and regression of TBV on predicted GBV, b_{TBV,pGBV}, was also obtained for assessing the bias of the prediction.
The genome was assumed to consist of 10 chromosomes each 100cM in length. Two scenarios were considered for the number of available SNP markers and the datasets under these two scenarios were denoted as Data I and Data II. In Data I, 101 marker loci were located every 1cM on each chromosome for a total of 1010 markers on a genome. In Data II, 1010 equidistant marker loci were located on each chromosome for a total of 10100 markers. We assumed that 100 equidistant QTLs were located on each chromosome such that a QTL was in the middle between two marker loci in both Data I and Data II. Therefore, there were a total of 1000 QTLs located on a whole genome. The mutation rates assumed per locus per meiosis were 2.5 × 10^{3} and 5.0 × 10^{5} for the marker locus and QTL, respectively. At least one mutation occurred in the most of the marker loci with a high mutation rate during the simulated generations. In the marker loci experiencing more than one mutation, the mutation remaining at the highest minor allele frequency (MAF) was regarded as visible, whereas the others were ignored, which resulted in the marker loci having two alleles similar to SNP markers. Although the mutation rate for QTL was assumed 2.5 × 10^{5} in the simulation for a single trait conducted in [3], we here doubled it for generating TBVs of multiple traits for the reason as described below.
The polymorphic QTLs at which mutation occurred were used to simulate the three traits, A, B and C, the heritabilities of which, denoted by h_{A}^{2}, h_{B}^{2} and h_{C}^{2}, respectively, were assumed to be h_{A}^{2} = 0.8, h_{B}^{2} = 0.1 and h_{C}^{2} = 0.1. We divided all polymorphic QTLs into four groups, Group1, Group2, Group3 and Group4, according to the causal relationship with traits, where Group1 was a group of QTLs affecting both traits A and B, Group2 was a group of QTLs affecting only trait A, Group3 was a group of QTLs affecting only trait B and Group4 was a group of QTLs affecting only trait C. Therefore, it was assumed that QTLs of Group1 had pleiotropic effects on traits A and B while QTLs of other groups affected only a single trait. Each polymorphic QTL was randomly assigned to one of these groups, that is, to Group1, Group2, Group3 and Group4 with respective probabilities 0.4, 0.1, 0.1 and 0.4. Therefore, 80% of the QTL loci affecting trait A were shared by trait B on average and whereas all the QTL loci affecting trait C influence neither traits A nor B. In this setting of multitrait case, the mutation rate of QTL was increased to 5.0 × 10^{5} from 2.5 × 10^{5}, the mutation rate adopted in [3] for generating a single trait, to retain the number of QTL per trait.
The pleiotropic effects of each QTL in Group1 were assumed to be correlated between traits A and B with correlation coefficient of 0.9. Consequently, genetic correlations between traits A and B, between A and C and between B and C, denoted as ρ_{GAB}, ρ_{GAC} and ρ_{GBC}, respectively, were ρ_{GAB} = 0.72 and ρ_{GAC} = ρ_{GBC} = 0 on average although the values of correlation coefficients were somewhat fluctuated in each data generation. This setting of traits was adopted to investigate how the prediction accuracy was increased for a lowheritability trait (trait B) by simultaneous analysis for multiple traits including a correlated highheritability trait (trait A).
The effects of QTL alleles were sampled from gamma distributions independently for each QTL. Pleiotropic effects of QTL alleles in Group1 were determined for traits A and B by generating two correlated gamma random variables, x and y, with the correlation coefficient of 0.9, and assigning them with positive or negative values with equal probabilities. The effects of QTL alleles in the other groups, which affected each single trait only, was determined by sampling a gamma random variable z and by similarly assigning it with the positive or negative sign. We generated these three random variables x, y and z such that their marginal distributions were the same gamma distribution with scale parameter 0.4 and shape parameter 1.66, Gamma(0.4,1.66). For obtaining correlated variables x and y, we generated three independent gamma variables, x_{1}, x_{2} and x_{3}, which were sampled from Gamma(0.36,1.66), Gamma(0.04,1.66) and Gamma(0.04,1.66), respectively, and determined the values of x and y as x = x_{1} + x_{2} and y = x_{1} + x_{3}. It can be shown that x and y had a correlation coefficient of 0.9 and the same marginal distribution Gamma(0.4,1.66) [23].
The environmental correlation coefficients between three traits were denoted by ρ_{EAB}, ρ_{EAC} and ρ_{EBC}, respectively, and assumed as ρ_{EAB} = 0.1, ρ_{EAC} = 0.2 and ρ_{EBC} = 0.3. The environmental effects were sampled from a trivariate normal distribution with a mean vector 0 and a variancecovariance matrix R_{E}, where
with σ_{EA}^{2}, σ_{EB}^{2} and σ_{EC}^{2} indicating environmental variances of three traits. Environmental variance of trait A was given by σ_{EA}^{2} = (1/h_{A}^{2}1)σ_{GA}^{2} with its heritability h_{A}^{2} and genetic variance σ_{GA}^{2}, which was variance of TBV of trait A, and those of traits B and C were obtained similarly. The environmental effects were added to TBVs which were given by the sum of QTL effects to determine phenotypic values of three traits for individuals in the 1001st generation.
In Data I, 100 replicated datasets were simulated while Data II consisted of 20 replicated datasets due to the larger number of SNPs. Each of replicated datasets included records of phenotypes of three traits and genotypes of SNPs for the training population (1001st generation) and only SNP genotypes for the test population (1002nd generation). To simulate the situation of missing phenotypes, we generated additional datasets by deleting the phenotypic records of some traits for some individuals in the 100 replicated training datasets of Data I. These 100 replicated datasets were referred to as Data III, where the phenotypic records of traits A, B and C were respectively deleted for individuals of i = 801–1000, individuals of i = 1500 and individuals of i = 201700 in 1000 individuals (i = 11000) of the 1001st generation of Data I. Therefore, in Data III, the prediction model for GBVs was constructed with a training dataset consisting of the phenotypes of 800, 500 and 500 individuals for traits A, B and C, respectively, where only 100 individuals (i = 701800) had phenotypic records of all three traits, and 1000 individuals in the 1002nd generation were predicted for GBV based on the genotypes of 1010 markers. The setting of nonzero environmental correlations, i.e., ρ_{EAB} = 0.1, ρ_{EAC} = 0.2 and ρ_{EBC} = 0.3, was adopted here to assess the benefit from the estimation process of missing phenotypes implemented in multitrait analyses for the prediction accuracy, where the information of environmental covariance between observed and missing phenotypes was utilized as in (18).
Each replicated dataset in Data I, Data II and Data III was analyzed using the proposed methods for multiple traits, MCBayes and varBayes, to construct the GBV prediction model in the1001st generation and investigate r_{TBV,pGBV} and b_{TBV, pGBV} for each trait in the 1002nd generation. For a comparison with conventional singletrait GBV prediction, the same datasets were also analyzed with the singletrait setting of MCBayes where three traits were separately treated without considering the correlation structure between traits.
We conducted crossvalidation as well to evaluate the prediction performance within a population in the 1001st generation without the 1002nd generation since the techniques of crossvalidation have commonly been used for evaluation of the accuracy in the studies of genomic selection with the actual datasets of animals [2426] and plants [2729], where the prediction accuracy of the model for the unobserved future samples was of concern. We applied 10fold crossvalidation. In brief, we randomly split a population of 1000 individuals in the 1001st generation into ten subpopulations with each size 100. By using a single subpopulation used as a test set and the remaining nine subpopulations as a training set to construct prediction model, the prediction accuracy and bias were assessed for the test set. This process was repeated ten times until all single subpopulations were used as test sets exactly once. We averaged the prediction accuracy and bias over ten repetitions to evaluate the prediction performance in each dataset. Because of much computational burden with MCMC analysis for repeated model constructions in crossvalidation, evaluation with 10fold crossvalidation was carried out for Data I only, where mean of the prediction accuracies and biases in 100 datasets were obtained as r_{TBV,pGBV} and b_{TBV, pGBV} for each trait and for each prediction method. In the process of crossvalidation, we also investigated the correlation between predicted GBV and the phenotypic value, r_{y,pGBV}, in place of TBV, and regression of predicted GBV on phenotypic value, b_{y, pGBV}, for each trait.
Two settings for the prior probability that a SNP has zero effect, π, were adopted in model construction, i.e., π was fixed at 0 or π was variable over the range 0 < π <1 and inferred from the data. The values of hyperparameter ν, degree of freedom of prior inverse Wishart distribution of Σ_{gl}, were set to 5.0 and 3.2 corresponding to π =0 and 0 < π <1 after preliminary analyses to evaluate the effects of the values of ν on prediction accuracy over 3 < ν <6 although the accuracy was little affected by the values of ν.
In the MCMC iteration of MCBayes, we repeated 11000 cycles including a burnin period of the first 1000 cycles. The values of parameters were sampled every 10 cycles to obtain the posterior means that were used to determine a prediction model for each generated dataset. In the method of varBayes, we adopted the criterion < 10^{8} for convergence in numerical iteration for computing the expectations of parameters, where and are the current and previous value of the expectations of parameters. When this criterion was satisfied, the computational procedure with variational approximation was regarded as converged.
Results
We evaluated the accuracy and bias of the predicted GBVs, r_{TBV,pGBV} and b_{TBV, pGBV}, obtained by the proposed methods for genomic selection of multiple traits including MCBayes and varBayes in 100 simulated datasets of Data I and Data III and in 20 datasets of Data II in comparison with the prediction accuracy with singletrait MCBayes analysis. The means of accuracies and biases of prediction evaluated using a test population in the 1002nd generation are summarized in Table 1, Table 2 and Table 3 for Data I, Data II and Data III, respectively. In all of datasets, MCBayes provided higher prediction accuracy than varBayes in multitrait analysis (Tables 1, 2 and 3). In Data I, r_{TBV,pGBV} obtained with multitrait MCBayes analysis was 0.788 (0.051), 0.581 (0.103) and 0.453 (0.090) for traits A, B and C, respectively, when π was fixed at 0, where standard errors were given in parenthesis here and hereafter, while that was 0.753 (0.060), 0.580 (0.117) and 0.364 (0.137) when π was varied and inferred. In Data II with the number of SNPs increased to 10100, r_{TBV,pGBV} with multitrait MCBayes analysis was higher at 0.902 (0.032), 0.706 (0.103) and 0.519 (0.097) for traits A, B and C, respectively, when π was fixed at 0 while that was 0.868 (0.047), 0.731 (0.120) and 0.401 (0.182) when π was inferred. With multitrait varBayes analysis, r_{TBV,pGBV} was 0.754 (0.061), 0.570 (0.113) and 0.383 (0.117) for traits A, B and C with π =0 and that was 0.716 (0.070), 0.548 (0.122) and 0.347 (0.131) with π variable in Data I while that was 0.859 (0.049) , 0.656 (0.110) and 0.438 (0.074) with π =0 and 0.838 (0.061) , 0.678 (0.140) and 0.330 (0.157) with π variable in Data II. The prediction with multitrait MCBayes was almost unbiased in Data I, where b_{TBV,pGBV} was near 1, but was biased in Data II for traits B and C (Tables 1 and 2). The analysis with varBayes showed greater bias than with MCBayes.
Table 1. Accuracy and bias of predicted GBVs in Data I
Table 2. Accuracy and bias of predicted GBVs in Data II
Table 3. Accuracy and bias of predicted GBVs in Data III
In singletrait analysis where MCBayes was applied for a singletrait model, r_{TBV,pGBV} was comparable with that of multitrait analysis for highheritability trait A while it was significantly decreased for lowheritability trait B which was highly correlated with trait A (Table 1 and Table 2). For trait C which had same heritability as trait B but was genetically independent of trait A (ρ_{GAC} = 0.0), multitrait MCBayes analysis gave the accuracy comparable to singletrait MCBayes analysis when π =0 while, when π was varied and inferred, singletrait analysis showed considerably higher accuracy than multitrait MCBayes and varBayes analyses as shown in Table 1 and Table 2. The prediction with singletrait analysis was almost unbiased In Data I, but that was likely to be biased in Data II with b_{TBV,pGBV} much deviated from 1 especially when π was varied and inferred.
In Data III with missing phenotypes, which was derived from Data I by removing some phenotypic records, r_{TBV,pGBV} was decreased for all traits in all methods due to the smaller sample size in comparison with Data I (Table 3). The rate of decrease in r_{TBV,pGBV} was greater for traits B and C than A as the greater reduction of the sample size was made in B and C. Multitrait MCBayes analysis provided higher prediction accuracy than multitrait varBayes analysis for all traits except for trait C withπ varied and inferred. For trait B, multitrait analysis showed higher prediction accuracy than singletrait analysis, but gave less accurate prediction for trait C. For the bias of predicted GBV, b_{TBV, pGBV}, the MCBayes analysis with π =0 was almost unbiased for both multitrait and singletrait analyses, where b_{TBV, pGBV} ranged 0.931 to 0.998, although the standard error of b_{TBV, pGBV} was increased for lowheritability traits B and C, for which the MCBayes analyses with π varied and varBayes showed much biased prediction.
We listed r_{TBV, pGBV} and b_{TBV, pGBV} obtained by crossvalidation conducted in a population of the 1001st generation of Data I as well as r_{y,pGBV} and b_{y,pGBV}, correlation between predicted GBV and phenotype and regression of phenotype on predicted GBV, in Table 4. The prediction accuracy evaluated with crossvalidation was considerably higher than that evaluated with a test population in the next generation, with the prediction bias being similarly for both evaluated with crossvalidation and use of a test population. The relative merits in performance of prediction between the methods were also similar for both evaluations. It was shown in Table 4 that the relational expression, r_{TBV,pGBV} = r_{y,pGBV}/h[30], held with h being a squareroot of heritability and that b_{TBV,pGBV} was almost the same as b_{y,pGBV}.
Table 4. Accuracy and bias of predicted GBVs evaluated with crossvalidation in Data I
Correlation coefficients between predicted GBVs of trait A, B and C were listed in Table 5 for multitrait MCBayes and varBayes analyses and singletrait MCBayes analysis in all datasets as well as the correlation coefficients between simulated breeding values (TBVs) of three traits in the 1002nd generation. Although the correlation of simulated breeding values between traits A and B was expected to be 0.72, the actual correlations obtained were biased upwards: 0.755 (0.133) and 0.735 (0.166) in Data I (III) and Data II, respectively. Multitrait analysis could better estimate these correlations compared to singletrait analysis as seen in the correlation between predicted GBVs between traits A and B. For trait C which was genetically independent of A and B, the correlations between predicted GBVs for traits A and C and traits B and C did not significantly deviate from zero (Table 5).
Table 5. Correlations between predicted GBVs for trait A, B and C
Discussion
In this study, we proposed Bayesian methods for simultaneously predicting GBVs for multiple traits, where two computational procedures were devised using MCMC iteration and variational approximation, referred to as MCBayes and varBayes, respectively. A Bayesian model for simultaneously analyzing multiple traits was obtained by extending a Bayesian model for singletrait genomic selection proposed by [2] and [4] to multitrait case. We introduced the prior probability that a SNP has zero effect, π, and accordingly, MCBayes with π fixed at 0, meaning that all SNPs are included in a model as covariates, constructs a model for multitrait GBV prediction in a similar manner to BayesD. On the other hand, the MCMC procedure of MCBayes with π variable and inferred from the data is not exactly the same as that of BayesDπ, where a prior distribution of the variance and covariance matrix of SNP effects, Σ_{gl}, is assumed to be independent of whether the SNP is included (γ_{l} = 1) or excluded(γ_{l} = 0) from the model in MCBayes, as seen in (4), while it is dependent onγ_{l} and takes different forms for γ_{l} = 1 and γ_{l} = 0 in BayesDπ. This modification allows MCMC iteration of MCBayes to be performed with only Gibbs sampling.
In the simultaneous analysis of multiple traits for constructing a GBV prediction model, the computational burden greatly increases depending on the number of analyzed traits in comparison with singletrait analysis. We developed a variational approximation procedure, varBayes, for MCBayes to reduce the computational time for multitrait analysis. In varBayes, the joint posterior distribution of parameters was approximated by a factorized function, each component of which approximated marginal posterior distribution of each parameter and was referred to as a variational posterior. Variational posteriors were shown to be wellknown distribution functions such as normal or inverse Wishart that could be derived by simple nonMCMC based numerical iteration.
In genomic selection, it is important to construct a model that enables accurate prediction for GVBs. Therefore, precise point estimation of the model parameters is more relevant rather than the construction of their posterior distributions. Accordingly, the evaluation of loss of prediction accuracy with varBayes in comparison with MCBayes would be suitable for the evaluation of approximation accuracy of varBayes. Using simulation experiments, we investigated the performance of the prediction model constructed with multitrait analysis compared with singletrait analysis as well as the model constructed using variational approximation. Moreover, the performance of multitrait analysis in the case of missing phenotypic records commonly occurring in the treatment of the actual data of multiple traits were evaluated based on the results of simulations. These points are discussed below including the computational time and the possible extension of prediction model considering polygenic effects.
Increase in accuracy for GBV prediction with multitrait analysis
We evaluated the increase in prediction accuracy with multitrait analysis in comparison with singletrait analysis using the datasets without missing phenotypes, Data I and Data II (Table 1 and Table 2). For trait A having heritability 0.8, multitrait analysis and singletrait analysis made no difference in the prediction accuracy with MCBayes. Therefore, the advantage of multitrait analysis over singletrait analysis in predicting GBV is negligible for highheritability traits. However, we anticipate the increase in accuracy with multitrait analysis for lowheritability traits utilizing correlations with highheritability traits. Actually, for lowheritability trait B with heritability 0.1, which has highly correlated with trait A (ρ_{GAB} = 0.72), prediction accuracy was increased with multitrait analysis. As trait C was not genetically correlated with trait A, the accuracy of predicting the GBV of C was not improved with multitrait MCBayes analysis. The accuracy of predicted GBV of trait B was also increased with multitrait varBayes analysis in Data I and Data II in comparison with singletrait MCBayes analysis while the prediction accuracy of trait C was lower with multitrait varBayes analysis (Table 1 and Table 2). The genetic correlations between traits A and B were better estimated with predicted GBVs in multitrait analysis than in singletrait analysis as shown in Table 5. Therefore, it can be concluded that lowheritability traits (heritability around 0.1) are better predicted for GBVs utilizing their correlations with highheritability traits (heritability around 0.8), if any, with multitrait analysis. However, the benefit of multitrait analysis would be subtle for highheritability traits. A similar finding was also reported in [13]. For uncorrelated lowheritability traits such as trait C, it is likely that the prediction using multitrait analysis with π varied and inferred is less effective in comparison with singletrait analysis. Therefore, multitrait analysis with π fixed at 0, which allows highly accurate prediction for correlated traits while retaining prediction accuracy comparable with singletrait analysis for uncorrelated lowheritability traits, would be a suitable method of choice for multitrait genomic selection.
Approximation accuracy of variational procedure for MCMC estimation
Generally, constructing a GBV prediction model with MCMC estimation based on genotypic records for tens of thousands of SNPs and phenotypes for hundreds of individuals requires considerable computational time even for singletrait cases. Much more computational burden would be imposed in constructing a model in multitrait analysis, depending on the number of traits of interest. Therefore, we proposed a computationally costeffective method, varBayes, approximating MCMC based method, MCBayes, using a variational approximation procedure.
Simulation experiments showed that the prediction accuracy was lower with varBayes than with MCBayes in multitrait analysis but the rate of loss of accuracy was not remarkable and was less than 10 percent for traits A and B under the same setting of π while it was greater for C (Table 1 and Table 2). The prediction accuracy for trait B correlated with highheritability trait A was still higher with multitrait varBayes analysis than with singletrait MCBayes analysis indicating that varBayes could well utilize the information on the correlation structure in multiple traits. Actually, multitrait varBayes analysis could better capture the genetic correlation between A and B than singletrait MCbayes analysis (Table 5).
The computational time was greatly reduced for multitrait varBayes analysis in comparison with multitrait MCBayes analysis. We carried out all computations using a Fortran program written to implement multipletrait analysis on a computer having two CPUs each with a quadcore processor (Intel Xeon 2.4GHz). In Data I, where 100 replicates of datasets each including genotypes of 1010 SNPs for 1000 individuals were simulated, varBayes took only 12 minutes with π fixed and 22 minutes with π varied for 100 times of model constructions while the computational time for MCBayes was respectively 440 and 435 minutes. Therefore, the average computational time required to construct a prediction model for each dataset in Data I was less than 15 seconds for varBayes and was more than 4 minutes for MCBayes. For a larger data, Data II, that included 20 replicates of datasets each consisting of genotypes of 10100 SNPs for 1000 individuals, the computational time was 21 and 19 minutes for varBayes with π fixed and π varied, respectively, and the time for MCBayes were increased to more than 12 hours with the average computational times for each model construction being about 1 minute for varBayes and more than 30 minutes for MCBayes. In crossvalidation process in which a total of one thousand repetitions of model constructions were performed in Data I, the computational time was about 3days with MCBayes while varBayes completed the analysis within only 4 hours.
Taking computational time and prediction accuracy into account, varBayes is considered a useful method for multitrait genomic selection, which can rapidly construct a prediction model that is less accurate than that with the MCMCbased method for multitrait analysis, but is more accurate than that with singletrait analysis for correlated traits. The usefulness of varBayes would be more remarkable for simultaneous prediction of GBVs of a large number of traits based on a huge number of SNPs where the application of an MCMCbased method might be prohibited.
Multitrait analysis of dataset with missing phenotypes
In Data III, we simulated the datasets under the same condition as Data I except that some phenotypes were assumed to be unobserved. In short, we assumed that phenotypes of traits A, B and C were not available for 200, 500 and 500 individuals, respectively, in a total of 1000 individuals with only 100 individuals having the phenotypes of all three traits. In multitrait analysis, missing phenotypes of individuals can be estimated with their observed phenotypes of other traits using (18), which indicates that residual effects of missing phenotypes can be restored from those of observed phenotypes. When the model fitting is successful for observed phenotypes, the residual effects of the phenotypes are well estimated by subtracting SNP effects and other fixed effects from the phenotypic effects, and those of missing phenotypes are suitably obtained by (18) utilizing the environmental correlation (covariance) between observed and missing phenotypes. Therefore, by assuming nonzero environmental correlation between traits (ρ_{EAB} = 0.1, ρ_{EAC} = 0.2, ρ_{EBC} = 0.3), the loss of prediction accuracy caused by missing phenotypes was anticipated to be less with multitrait analysis than with singletrait analysis. We expected that, in Data III, the prediction accuracy of trait C, environmentally correlated with trait A (ρ_{EAC} = 0.2), was maintained higher in the presence of missing records for some phenotypes using multitrait analysis in comparison with singletrait analysis. However, the rate of loss for the prediction accuracy for trait C was similar between multitrait and singletrait analyses as seen in the comparison of the prediction accuracies between Data I and Data III (Tables 1 and 3). The utility of implementation of (18) for imputing missing phenotypes in the process of multitrait model construction remains unclear in the settings of simulation adopted here.
Model extension by including polygenic effects
We can modify the Bayesian model (1) by including polygenic effects as follows;
where v_{i} is a vector of polygenic effects for multiple traits and assumed to follow a multivariate normal distribution, v_{i} ~ N(0, Σ_{v}) with Σ_{v} being a variancecovariance matrix of polygenic effects. The polygenic effects for all individuals of a training population and a test population are collectively denoted by V, then the variancecovariance matrix of V can be expressed as A ⊗ Σ_{v}, where A is additive genetic relationship matrix for analyzed individuals, computed from the information of pedigree or markers. When the lowdensity markers are used for the analysis, a considerable portion of genetic effects could not be captured by markers. Accordingly, if pedigree information is available, inclusion of polygenic effects estimated based on pedigree is beneficial in predicting breeding values for genomic selection. The revised model (19) can be similarly treated by MCBayes and varBayes as the model (1), where estimation steps for additional covariates and parameters, V and Σ_{v}, can be easily implemented in the procedures of Bayesian multitrait model construction using MCMC iteration and variational approximation. When no pedigree information is available, A is computed from marker information as a genomic relationship matrix. However, the model (19) includes all available markers as covariates, resulting in redundantly using the same marker information in the model fitting. In the availability of highdensity SNP markers, which is the case for some species of animals and plants currently, the genetic relationships between individuals can be well captured by markers themselves in the multilocus model (1), the benefit from the inclusion of the polygenic effects in the model would be subtle [31].
Conclusion
In this study, we described a statistical model for Bayesian simultaneous prediction of GBVs in genomic selection targeting multiple traits and devised an MCMCbased method and a computationally costeffective method utilizing the variational approximation procedure, referred to as MCBayes and varBayes, respectively, to estimate parameters included in the model. The results of simulation experiments showed that the multitrait analysis that could utilize the correlation structure between traits allowed more accurate prediction of GBVs for correlated traits compared to singletrait analysis that treated each trait separately, where, for lowheritability traits correlated with highheritability traits, the prediction accuracy for GBVs was remarkably improved with multitrait analysis. Although the prediction accuracy with varBayes was lower than that with MCBayes in multitrait analysis, the rate of loss in accuracy was moderate and the accuracy for correlated lowheritability traits was still higher with varBayes analysis compared to singletrait analysis. Considering the benefit of greatly reduced computational time, varBayes was considered to be a practical method for predicting GBVs in multitrait genomic selection.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
TH devised Bayesian prediction methods for the genomic selection of multiple traits, developed a program for simulations and drafted the manuscript. HI assisted in developing a program and drafted the final manuscript. Both authors read and approved the final manuscript.
Acknowledgements
This study was supported by the GrantinAid for Scientific Research (B) of Japan Society for the Promotion of Science (Grant No. 22380010).
References

Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES: A robust, simple genotypingbysequencing (GBS) approach for high diversity species.
PLoS One 2011, 6(5):e19379. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Meuwissen THE, Hayes B, Goddard ME: Prediction of total genetic value using genomewide dense marker maps.
Genetics 2001, 157:18191829. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Solberg TR, Sonesson AK, Wooliams JA, Meuwissen THE: Genomic selection using different marker types and densities.
J Anim Sci 2008, 86:24472454. PubMed Abstract  Publisher Full Text

Habier D, Fernando RL, Kizilkaya K, Garrick DJ: Extension of the Bayesian alphabet for genomic selection.
BMC Bioinformatics 2011, 12:186. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Xu S: Estimating polygenic effects using makers of the entire genome.

George EI, McCullogh RE: Variable selection via Gibbs sampling.
J Am Stat Assoc 1993, 88:881889. Publisher Full Text

Gianola D, Delos Campos G, HillW G, Manfredi E, Fernando R: Additive genetic variability and the Bayesian alphabet.
Genetics 2009, 183:347363. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Xu C, Wang X, Li Z, Xu S: Mapping QTL for multiple traits using Bayesian statistics.

Banerjee S, Yandell BS, Yi N: Bayesian Quantitative Trait Loci Mapping for Multiple Traits.
Genetics 2008, 179:22752289. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Meuwissen THE, Goddard ME: Mapping multiple QTL using linkage disequilibrium and linkage analysis information and multitrait data.
Genet Sel Evol 2004, 36:261279. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Carlin BP, Chib S: Bayesian model choice via Markov chain Monte Carlo.

Yi N: A unified Markov chain Monte Carlo framework for mapping multiple quantitative trait loci.
Genetics 2004, 167:967975. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Calus MP, Veerkamp RF: Accuracy of multitrait genomic selection using different methods.
Genet Sel Evol 2011, 43:26. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Yi N, Banerjee S: Hierarchical generalized linear models for multiple quantitative trait locus mapping.
Genetics 2009, 181:11011113. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Logsdon BA, Hoffman G, Mezey J: A variational Bayes algorithm for fast and accurate multiple locus genomewide association analysis.
BMC Bioinformatics 2010, 11:58. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Li Z, Sillanpää MJ: Estimation of quantitative trait locus effects with epistasis by variational Bayes algorithms.
Genetics 2012, 190:231249. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Hayashi T, Iwata H: EM algorithm for Bayesian estimation of genomic breeding values.
BMC Genet 2010, 11:3. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Shepherd RK, Meuwissen THE, Wooliams JA: Genomic selection and complex trait prediction using a fast EM algorithm applied to genomewide markers.
BMC Bioinformatics 2010, 11:529. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Attias H: Inferring parameters and structure of latent variable models by variational Bayes. San Francisco, CA: MorganKaufmann; 1999:2130. [Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence]

Bishop CM: Pattern recognition and machine learning. New York: SpringerVerlag; 2006.

Beal M: Variational algorithms for approximate Bayesian inference. University of London: PhD thesis; 2003.

VanTassel CP, VanVleck LD: Multipletrait Gibbs sampler for animal models: Flexible programs for Bayesian and likelihoodbased (co)variance component inference.
J Anim Sci 1996, 74:25862597. PubMed Abstract  Publisher Full Text

Mathai AM, Moschopoulos PG: A form of multivariate gamma distribution.
Ann Inst Stat Math 1992, 44:97106. Publisher Full Text

Moser G, Tier B, Crump RE, Khatkar MS, Raadsma HW: A comparison of five methods to predict genomic breeding values of dairy bulls from genomewide SNP markers.
Genet Sel Evol 2009, 41:56. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Luan T, Woolliams JA, Lien S, Kent M, Svendsen M, Meuwissen THE: The accuracy of genomic selection in Norwegian red cattle assessed by crossvalidation.
Genetics 2009, 183:11191126. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Saatchi M, McClure MC, McKay SD, Megan M, Rolf MM, Kim JW, Decker JE, Taxis TM, Chapple RH, Ramey HR, et al.: Accuracies of genomic breeding values in American Angus beef cattle using Kmeans clustering for crossvalidation.
Genet Sel Evol 2011, 43:40. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Lorenzana RE, Bernardo R: Accuracy of genotypic value predictions for markerbased selection in biparental plant populations.
Theor Appl Genet 2009, 120:151161. PubMed Abstract  Publisher Full Text

Crossa J, Delos Campos G, Pérez P, Gianola D, Burgueño J, Araus JL, Makumbi D, Singh RP, Dreisigacker S, Yan J, et al.: Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers.
Genetics 2010, 186:713724. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Heffner EL, Jannink JL, Iwata H, Souza E, Sorrells ME: Genomic selection accuracy for grain quality traits in biparental wheat populations.
Crop Sci 2011, 51:25972606. Publisher Full Text

Dekkers JCM: Prediction of response to markerassisted and genomic selection using selection index theory.
J Anim Breed Genet 2007, 124:331341. PubMed Abstract  Publisher Full Text

Kärkkäinen HP, Sillanpää MJ: Robustness of Bayesian multilocus association models to cryptic relatedness.
Hum Genet 2012, 76:510523. Publisher Full Text