Agroinformatics Division, National Agriculture and Food Research Organization, Agricultural Research Center, Kannondai, Tsukuba, Ibaraki, 305-8666, Japan
Department of Agricultural and Environmental Biology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo, Tokyo, 113-8657, Japan
Abstract
Background
Genomic selection is an effective tool for animal and plant breeding, allowing effective individual selection without phenotypic records through the prediction of genomic breeding value (GBV). To date, genomic selection has focused on a single trait. However, actual breeding often targets multiple correlated traits, and, therefore, joint analysis taking into consideration the correlation between traits, which might result in more accurate GBV prediction than analyzing each trait separately, is suitable for multi-trait genomic selection. This would require an extension of the prediction model for single-trait GBV to multi-trait case. As the computational burden of multi-trait analysis is even higher than that of single-trait analysis, an effective computational method for constructing a multi-trait prediction model is also needed.
Results
We described a Bayesian regression model incorporating variable selection for jointly predicting GBVs of multiple traits and devised both an MCMC iteration and variational approximation for Bayesian estimation of parameters in this multi-trait model. The proposed Bayesian procedures with MCMC iteration and variational approximation were referred to as MCBayes and varBayes, respectively. Using simulated datasets of SNP genotypes and phenotypes for three traits with high and low heritabilities, we compared the accuracy in predicting GBVs between multi-trait and single-trait analyses as well as between MCBayes and varBayes. The results showed that, compared to single-trait analysis, multi-trait analysis enabled much more accurate GBV prediction for low-heritability traits correlated with high-heritability traits, by utilizing the correlation structure between traits, while the prediction accuracy for uncorrelated low-heritability traits was comparable or less with multi-trait analysis in comparison with single-trait analysis depending on the setting for prior probability that a SNP has zero effect. Although the prediction accuracy with varBayes was generally lower than with MCBayes, the loss in accuracy was slight. The computational time was greatly reduced with varBayes.
Conclusions
In genomic selection for multiple correlated traits, multi-trait analysis was more beneficial than single-trait analysis and varBayes was much advantageous over MCBayes in computational time, which would outweigh the loss of prediction accuracy caused by the approximation procedure, and is thus considered a practical method of choice.
Background
A huge number of genome-wide polymorphisms have recently been elucidated in livestock and crops with the development of sequencing technologies. High-throughput genotyping systems, such as high-density SNP chips containing several tens or hundreds of thousands of genome-wide SNP markers and GBS (genotyping by sequence)
In the original study of genomic selection by Meuwissen et al.
In BayesA and BayesB, an additional hierarchical structure is induced for this SNP-specific variance, where an inverted chi-square distribution with degree of freedom
These Bayesian methods were mainly developed for genomic selection of a single trait. However, actual breeding of animals and plants often aims to simultaneously improve multiple correlated traits. Therefore, joint prediction of GBVs for multiple traits, taking into consideration the correlation structure between traits, is suitable for multi-trait genomic selection, which requires the extension of existing methods for single-trait GBV prediction to multi-trait case. In QTL mapping methods that use similar models to those of genomic selection, some researchers have developed multi-trait models
The computational procedure with MCMC iteration is generally used for the Bayesian methods to estimate parameters in the models, which become complicated models in genomic selection, including a huge number of SNPs as covariates and SNP effects that are estimated as their regression coefficients. The computational burden of MCMC-based Bayesian methods, which requires a long time until convergence when estimating many parameters is huge even in single-trait GBV prediction and would be further increased in the case of multiple traits, thus hindering the MCMC procedure depending on the number of traits to be jointly analyzed. Therefore, it would be necessary to devise a solution for reducing the computational burden of Bayesian methods for multi-trait GBV prediction. So far, some non-MCMC computational procedures for Bayesian methods have been proposed in QTL mapping and genome-wide association study, including EM-algorithm
In this paper, we propose Bayesian methods for multi-trait GBV prediction, in which BSR models allowing variable selection are developed and MCMC procedures for estimating model parameters including SNP effects are described as well as a computationally cost-effective non-MCMC method using variational approximation as an alternative computational procedure. Hereafter, the Bayesian methods based on MCMC iteration and variational approximation are referred to as MCBayes and varBayes, respectively. The multi-trait Bayesian models described here include a Bayesian shrinkage regression (BSR) models that are equivalent to those adopted by BayesA and BayesD when variable selection is not conducted, and the models of BSSVS that are regarded as slightly modified versions of BayesB and BayesD
Using simulated datasets consisting of genotypes of genome-wide dense SNPs and phenotypes of three correlated traits with high and low heritabilities, we investigated the differences in prediction accuracy for each trait between multi-trait analysis, where GBVs of three traits were simultaneously predicted taking the correlation structure between traits into consideration, and single-trait analysis, where each trait was separately predicted for GBV. We also evaluated the prediction accuracy of the varBayes methods in comparison with MCBayes. Moreover, we investigated the performance of multi-trait analysis in simulated data including missing phenotypes.
Methods
In this section, we describe Bayesian models for multi-trait GBV prediction and computational procedures for Bayesian estimation of the model parameters, including construction of posterior distributions of the parameters using MCMC iteration and variational approximation. Here, we consider the statistical models for BSR and BSSVS, which are shown to be equivalent to BayesD and similar to BayesD
We assume that the number of SNPs genotyped is
Models for Bayesian stochastic search variable selection in multi-trait genomic selection
We propose the following Bayesian multi-locus linear model for the phenotypes of
where b is a
Within the Bayesian framework, prior distributions are assigned to the parameters of the model (1). We assume that the priors of the elements of b are the improper uniform distribution over the possible values. The prior probabilities that
where Σ
_{
gl
} is a
Although the SNP effect g
_{
l
} is zero and irrelevant to Σ
_{
gl
} when
Denoting these parameters of the Bayesian model collectively by θ, the prior distribution of θ by
from (1), (2) and (3), where y
_{
i
}* is a residual given by
Given that S is fixed, posterior distribution (4) is equivalent to that of BayesA extended to multi-trait case when
Derivation of full conditional posterior distributions of parameters in a statistical model.
Click here for file
Variational approximation procedure for multi-trait Bayesian model
We adopted variational approximation as an alternative to MCMC iteration for constructing marginal posterior distributions of parameters based on joint posterior distribution (4). In the variational approximation procedure, the joint posterior distribution is approximated by the product of functions for subsets of parameters with lower dimension. Briefly, we assume that
It can be shown that
where
In the varBayes method that applies the variational approximation procedure to the multi-trait Bayesian model considered here, g(θ|
where we denote all of the variational posteriors for the different parameters in the right-hand side by
Derivation of variational posteriors for parameters in a statistical model.
Click here for file
The variational posterior for b,
and variance-covariance matrix
where
from which we obtain
The variational posterior of Σ
_{
gl
} is represented by IW_{
T
} (
and
The expectation of Σ _{ gl } ^{-1} with respect to this posterior distribution is
For
where
and variance-covariance matrix
The marginal posterior distribution of
and
The conditional distribution of g
_{
l
} given
Therefore, we obtain
and
From (4), the variational posterior of S is a
As outlined above, a well-known probability distribution, such as normal, inverse Wishart and so on, is assigned to the variational posterior of each parameter, which is characterized by the expectations of the functions of other parameters, taken with respect to their variational posteriors. The relationships between these expectations are given by (6), (8), (9), (13), (14), (15) and (16), from which the expectations can be calculated with numerical iterations to obtain the variational posteriors.
Moreover, the prior probability for a SNP to have zero effect,
Accordingly, the variational posteriors of the other parameters are modified with
Treatment of missing phenotypes
In the phenotypic records for multiple traits, it is common for trait values to be partially missing in some individuals. Missing phenotypes of a trait in individuals can be inferred with the observed phenotypes of other traits in the same individuals. The step for inferring missing phenotypes can be implemented in the MCBayes and varBayes procedures as described below.
When there are missing phenotypes in an individual, the residual vector of the individual, e in model (1), is partitioned into components e
_{o} and e
_{m} corresponding to observed and missing phenotypes, respectively. Following
where Σ _{mm}, Σ _{mo} and Σ _{oo} indicate the partition of the residual variance-covariance matrix Σ _{ e } corresponding to e _{m} and e _{o}. Accordingly, e _{m} is drawn with Gibbs sampling in MCBayes while it is obtained as E(Σ _{mo})E(Σ _{oo})^{-1}e _{o}* in varBayes, where, denoting the component corresponding to the missing traits by subscript ‘o’, e _{o}* is written as
and the expectations can be calculated with the variational posteriors of Σ _{ e }, b and g _{ l } . Missing phenotypes are inferred as the sum of the estimates of b, g _{ l } and e _{m}, which are used for the construction of the prediction model.
Simulation experiments
We simulated datasets to evaluate the accuracy of the predicted GBVs using the proposed Bayesian methods, MCBayes and varBayes, for multiple traits. In generating the datasets, three traits, denoted as A, B and C, were considered. The simulation of population and genome was carried out following
Populations with an effective population size of 100 were maintained by random mating for 1000 generations to attain mutation drift balance and LD between SNPs and QTLs. In generation 1001 and 1002, the population size was increased to 1000. The population in the 1001st generation was treated as a training population, where the phenotypes of three traits and SNP genotypes of the individuals were simulated and analyzed to estimate the SNP effects in the model. The phenotype of each trait for each individual in the 1001st generation was given as the sum of QTL effects over the polymorphic QTLs and environmental effects, which were sampled as described later. For simplicity, no other fixed effects were assumed. The population in the 1002nd generation was used as a test population, where the individuals were only genotyped for SNP markers without phenotypic records and GBVs of three traits were predicted for each individual using a model with SNP effects estimated based on the training population in the 1001st generation. The true breeding value (TBV) of the individual in the 1002nd generation was also simulated as the sum of QTL effects corresponding to the QTL genotype for each trait and used for evaluating the accuracy of predicted GBV, but was regarded as unknown and unavailable in the estimation of SNP effects in the models. Accuracy was measured based on the correlation between the TBV and predicted GBV,
The genome was assumed to consist of 10 chromosomes each 100cM in length. Two scenarios were considered for the number of available SNP markers and the datasets under these two scenarios were denoted as Data I and Data II. In Data I, 101 marker loci were located every 1cM on each chromosome for a total of 1010 markers on a genome. In Data II, 1010 equidistant marker loci were located on each chromosome for a total of 10100 markers. We assumed that 100 equidistant QTLs were located on each chromosome such that a QTL was in the middle between two marker loci in both Data I and Data II. Therefore, there were a total of 1000 QTLs located on a whole genome. The mutation rates assumed per locus per meiosis were 2.5 × 10^{-3} and 5.0 × 10^{-5} for the marker locus and QTL, respectively. At least one mutation occurred in the most of the marker loci with a high mutation rate during the simulated generations. In the marker loci experiencing more than one mutation, the mutation remaining at the highest minor allele frequency (MAF) was regarded as visible, whereas the others were ignored, which resulted in the marker loci having two alleles similar to SNP markers. Although the mutation rate for QTL was assumed 2.5 × 10^{-5} in the simulation for a single trait conducted in
The polymorphic QTLs at which mutation occurred were used to simulate the three traits, A, B and C, the heritabilities of which, denoted by
The pleiotropic effects of each QTL in Group1 were assumed to be correlated between traits A and B with correlation coefficient of 0.9. Consequently, genetic correlations between traits A and B, between A and C and between B and C, denoted as
The effects of QTL alleles were sampled from gamma distributions independently for each QTL. Pleiotropic effects of QTL alleles in Group1 were determined for traits A and B by generating two correlated gamma random variables,
The environmental correlation coefficients between three traits were denoted by
with
In Data I, 100 replicated datasets were simulated while Data II consisted of 20 replicated datasets due to the larger number of SNPs. Each of replicated datasets included records of phenotypes of three traits and genotypes of SNPs for the training population (1001st generation) and only SNP genotypes for the test population (1002nd generation). To simulate the situation of missing phenotypes, we generated additional datasets by deleting the phenotypic records of some traits for some individuals in the 100 replicated training datasets of Data I. These 100 replicated datasets were referred to as Data III, where the phenotypic records of traits A, B and C were respectively deleted for individuals of
Each replicated dataset in Data I, Data II and Data III was analyzed using the proposed methods for multiple traits, MCBayes and varBayes, to construct the GBV prediction model in the1001st generation and investigate
We conducted cross-validation as well to evaluate the prediction performance within a population in the 1001st generation without the 1002nd generation since the techniques of cross-validation have commonly been used for evaluation of the accuracy in the studies of genomic selection with the actual datasets of animals
Two settings for the prior probability that a SNP has zero effect,
In the MCMC iteration of MCBayes, we repeated 11000 cycles including a burn-in period of the first 1000 cycles. The values of parameters were sampled every 10 cycles to obtain the posterior means that were used to determine a prediction model for each generated dataset. In the method of varBayes, we adopted the criterion
Results
We evaluated the accuracy and bias of the predicted GBVs,
Method
Trait A
Trait B
Trait C
Averages and standard errors based on 100 replicates of simulated data are listed for prediction accuracy,
MCBayes
0.788 ± 0.051
0.581 ± 0.103
0.453 ± 0.090
0.994 ± 0.038
1.048 ± 0.264
1.00 ± 0.370
0 <
0.753 ± 0.060
0.580 ± 0.117
0.364 ± 0.137
1.070 ± 0.064
1.149 ± 0.340
1.016 ± 0.364
varBayes
0.754 ± 0.061
0.570 ± 0.113
0.383 ± 0.117
1.054 ± 0.051
0.994 ± 0.233
0.899 ± 0.247
0 <
0.716 ± 0.070
0.548 ± 0.122
0.347 ± 0.131
0.894 ± 0.054
0.834 ± 0.186
0.636 ± 0.202
single-trait
0.783 ± 0.051
0.469 ± 0.083
0.455 ± 0.076
(MCBayes)
0.978 ± 0.037
1.020 ± 0.301
0.970 ± 0.259
0 <
0.778 ± 0.050
0.491 ± 0.114
0.483 ± 0.101
1.089 ± 0.054
1.110 ± 0.634
1.061 ± 0.338
Method
Trait A
Trait B
Trait C
Averages and standard errors based on 20 replicates of simulated data are listed for prediction accuracy,
MCBayes
0.902 ± 0.032
0.706 ± 0.103
0.519 ± 0.097
0.998 ± 0.034
0.902 ± 0.111
0.796 ± 0.179
0 <
0.868 ± 0.047
0.731 ± 0.120
0.401 ± 0.182
1.092 ± 0.093
1.189 ± 0.199
1.198 ± 0.553
varBayes
0.859 ± 0.049
0.656 ± 0.110
0.438 ± 0.074
1.059 ± 0.065
0.799 ± 0.105
0.724 ± 0.111
0 <
0.838 ± 0.061
0.678 ± 0.140
0.330 ± 0.157
0.983 ± 0.034
0.851 ± 0.138
0.562 ± 0.155
single-trait
0.884 ± 0.039
0.485 ± 0.086
0.493 ± 0.089
(MCBayes)
0.974 ± 0.035
0.766 ± 0.113
0.766 ± 0.113
0 <
0.843 ± 0.044
0.597 ± 0.120
0.601 ± 0.109
1.562 ± 0.261
1.787 ± 0.431
1.832 ± 0.565
Method
Trait A
Trait B
Trait C
Averages and standard errors based on 100 replicates of simulated data are listed for prediction accuracy,
MCBayes
0.766 ± 0.058
0.500 ± 0.127
0.322 ± 0.082
0.977 ± 0.048
0.998 ± 0.356
0.967 ± 0.773
0 <
0.723 ± 0.069
0.503 ± 0.141
0.202 ± 0.119
1.065 ± 0.076
1.195 ± 0.530
0.799 ± 0.523
varBayes
0.726 ± 0.072
0.447 ± 0.131
0.261 ± 0.134
0.984 ± 0.052
0.582 ± 0.241
0.383 ± 0.181
0 <
0.679 ± 0.081
0.387 ± 0.115
0.228 ± 0.112
0.840 ± 0.068
0.389 ± 0.132
0.240 ± 0.110
single-trait
0.760 ± 0.058
0.345 ± 0.070
0.336 ± 0.068
(MCBayes)
0.965 ± 0.047
0.931 ± 0.368
0.969 ± 0.510
0 <
0.758 ± 0.057
0.362 ± 0.105
0.354 ± 0.101
1.086 ± 0.068
1.455 ± 1.401
1.251 ± 1.310
In single-trait analysis where MCBayes was applied for a single-trait model,
In Data III with missing phenotypes, which was derived from Data I by removing some phenotypic records,
We listed
Method
Trait A
Trait B
Trait C
Averages and standard errors evaluated with 10- fold cross-validation are listed based on 100 replicates of simulated data in Data I are listed for prediction accuracy,
MCBayes
0.832 ± 0.039
0.611 ± 0.095
0.501 ± 0.083
1.016 ± 0.022
1.072 ± 0.231
1.052 ± 0.341
0.741 ± 0.037
0.191 ± 0.045
0.160 ± 0.050
1.013 ± 0.015
1.062 ± 0.125
1.039 ± 0.130
0 <
0.791 ± 0.048
0.603 ± 0.112
0.390 ± 0.127
1.132 ± 0.064
1.210 ± 0.303
1.180 ± 0.379
0.705 ± 0.045
0.191 ± 0.046
0.121 ± 0.060
1.131 ± 0.065
1.210 ± 0.170
1.119 ± 0.373
varBayes
0.813 ± 0.049
0.620 ± 0.108
0.470 ± 0.118
1.080 ± 0.048
0.994 ± 0.157
0.963 ± 0.201
0.722 ± 0.047
0.187 ± 0.056
0.143 ± 0.058
1.072 ± 0.049
0.945 ± 0.195
0.931 ± 0.289
0 <
0.779 ± 0.059
0.593 ± 0.111
0.423 ± 0.123
0.944 ± 0.040
0.816 ± 0.139
0.662 ± 0.153
0.690 ± 0.056
0.180 ± 0.055
0.125 ± 0.062
0.935 ± 0.039
0.787 ± 0.166
0.626 ± 0.255
single-trait
0.826 ± 0.040
0.515 ± 0.073
0.505 ± 0.074
(MCBayes)
0.997 ± 0.023
1.073 ± 0.270
1.030 ± 0.303
0.735 ± 0.039
0.159 ± 0.045
0.162 ± 0.044
0.993 ± 0.012
1.071 ± 0.162
1.055 ± 0.222
0 <
0.821 ± 0.039
0.531 ± 0.099
0.522 ± 0.094
1.131 ± 0.046
1.265 ± 0.555
1.192 ± 0.405
0.731 ± 0.037
0.164 ± 0.051
0.164 ± 0.048
1.127 ± 0.043
1.249 ± 0.482
1.205 ± 0.457
Correlation coefficients between predicted GBVs of trait A, B and C were listed in Table
Data
Method
A-B (0.72)
A-C (0.0)
B-C (0.0)
Averages and standard errors are listed based on 100 replicates of simulated data in Data I and Data III and 20 replicates in Data II. Simulated BV indicates simulated breeding values, where expected correlations are 0.72, 0.0 and 0.0 for trait-pairs A-B, A-C and B-C as listed in parentheses. In Data III, correlations between simulated breeding values are the same as those in Data I.
I
MCBayes
0.588 ± 0.181
−0.071 ± 0.175
−0.091 ± 0.199
0 <
0.699 ± 0.188
−0.057 ± 0.250
−0.076 ± 0.301
varBayes
0.688 ± 0.184
−0.100 ± 0.241
−0.077 ± 0.279
0 <
0.644 ± 0.169
−0.053 ± 0.192
−0.058 ± 0.247
Single-trait (MCBayes)
0.446 ± 0.132
0.058 ± 0.096
0.096 ± 0.121
0 <
0.452 ± 0.183
0.035 ± 0.090
0.060 ± 0.111
Simulated BV
0.755 ± 0.133
0.003 ± 0.054
0.004 ± 0.060
II
MCBayes
0.606 ± 0.197
−0.129 ± 0.113
−0.071 ± 0.140
0 <
0.721 ± 0.222
−0.161 ± 0.149
−0.128 ± 0.179
varBayes
0.614 ± 0.206
−0.176 ± 0.135
−0.090 ± 0.179
0 <
0.671 ± 0.210
−0.153 ± 0.168
−0.047 ± 0.193
Single-trait (MCBayes)
0.394 ± 0.115
0.020 ± 0.075
0.072 ± 0.111
0 <
0.479 ± 0.206
−0.018 ± 0.084
0.022 ± 0.094
Simulated BV
0.735 ± 0.166
−0.031 ± 0.054
−0.012 ± 0.041
III
MCBayes
0.530 ± 0.195
−0.032 ± 0.223
−0.037 ± 0.238
0 <
0.662 ± 0.208
−0.014 ± 0.326
−0.013 ± 0.369
varBayes
0.555 ± 0.194
−0.028 ± 0.218
−0.023 ± 0.248
0 <
0.455 ± 0.167
−0.001 ± 0.158
0.021 ± 0.190
Single-trait (MCBayes)
0.315 ± 0.106
0.029 ± 0.105
0.080 ± 0.134
0 <
0.323 ± 0.138
0.024 ± 0.097
0.057 ± 0.130
Discussion
In this study, we proposed Bayesian methods for simultaneously predicting GBVs for multiple traits, where two computational procedures were devised using MCMC iteration and variational approximation, referred to as MCBayes and varBayes, respectively. A Bayesian model for simultaneously analyzing multiple traits was obtained by extending a Bayesian model for single-trait genomic selection proposed by
In the simultaneous analysis of multiple traits for constructing a GBV prediction model, the computational burden greatly increases depending on the number of analyzed traits in comparison with single-trait analysis. We developed a variational approximation procedure, varBayes, for MCBayes to reduce the computational time for multi-trait analysis. In varBayes, the joint posterior distribution of parameters was approximated by a factorized function, each component of which approximated marginal posterior distribution of each parameter and was referred to as a variational posterior. Variational posteriors were shown to be well-known distribution functions such as normal or inverse Wishart that could be derived by simple non-MCMC based numerical iteration.
In genomic selection, it is important to construct a model that enables accurate prediction for GVBs. Therefore, precise point estimation of the model parameters is more relevant rather than the construction of their posterior distributions. Accordingly, the evaluation of loss of prediction accuracy with varBayes in comparison with MCBayes would be suitable for the evaluation of approximation accuracy of varBayes. Using simulation experiments, we investigated the performance of the prediction model constructed with multi-trait analysis compared with single-trait analysis as well as the model constructed using variational approximation. Moreover, the performance of multi-trait analysis in the case of missing phenotypic records commonly occurring in the treatment of the actual data of multiple traits were evaluated based on the results of simulations. These points are discussed below including the computational time and the possible extension of prediction model considering polygenic effects.
Increase in accuracy for GBV prediction with multi-trait analysis
We evaluated the increase in prediction accuracy with multi-trait analysis in comparison with single-trait analysis using the datasets without missing phenotypes, Data I and Data II (Table
Approximation accuracy of variational procedure for MCMC estimation
Generally, constructing a GBV prediction model with MCMC estimation based on genotypic records for tens of thousands of SNPs and phenotypes for hundreds of individuals requires considerable computational time even for single-trait cases. Much more computational burden would be imposed in constructing a model in multi-trait analysis, depending on the number of traits of interest. Therefore, we proposed a computationally cost-effective method, varBayes, approximating MCMC based method, MCBayes, using a variational approximation procedure.
Simulation experiments showed that the prediction accuracy was lower with varBayes than with MCBayes in multi-trait analysis but the rate of loss of accuracy was not remarkable and was less than 10 percent for traits A and B under the same setting of
The computational time was greatly reduced for multi-trait varBayes analysis in comparison with multi-trait MCBayes analysis. We carried out all computations using a Fortran program written to implement multiple-trait analysis on a computer having two CPUs each with a quad-core processor (Intel Xeon 2.4GHz). In Data I, where 100 replicates of datasets each including genotypes of 1010 SNPs for 1000 individuals were simulated, varBayes took only 12 minutes with
Taking computational time and prediction accuracy into account, varBayes is considered a useful method for multi-trait genomic selection, which can rapidly construct a prediction model that is less accurate than that with the MCMC-based method for multi-trait analysis, but is more accurate than that with single-trait analysis for correlated traits. The usefulness of varBayes would be more remarkable for simultaneous prediction of GBVs of a large number of traits based on a huge number of SNPs where the application of an MCMC-based method might be prohibited.
Multi-trait analysis of dataset with missing phenotypes
In Data III, we simulated the datasets under the same condition as Data I except that some phenotypes were assumed to be unobserved. In short, we assumed that phenotypes of traits A, B and C were not available for 200, 500 and 500 individuals, respectively, in a total of 1000 individuals with only 100 individuals having the phenotypes of all three traits. In multi-trait analysis, missing phenotypes of individuals can be estimated with their observed phenotypes of other traits using (18), which indicates that residual effects of missing phenotypes can be restored from those of observed phenotypes. When the model fitting is successful for observed phenotypes, the residual effects of the phenotypes are well estimated by subtracting SNP effects and other fixed effects from the phenotypic effects, and those of missing phenotypes are suitably obtained by (18) utilizing the environmental correlation (covariance) between observed and missing phenotypes. Therefore, by assuming non-zero environmental correlation between traits (
Model extension by including polygenic effects
We can modify the Bayesian model (1) by including polygenic effects as follows;
where v
_{
i
} is a vector of polygenic effects for multiple traits and assumed to follow a multivariate normal distribution, v
_{
i
} ~
Conclusion
In this study, we described a statistical model for Bayesian simultaneous prediction of GBVs in genomic selection targeting multiple traits and devised an MCMC-based method and a computationally cost-effective method utilizing the variational approximation procedure, referred to as MCBayes and varBayes, respectively, to estimate parameters included in the model. The results of simulation experiments showed that the multi-trait analysis that could utilize the correlation structure between traits allowed more accurate prediction of GBVs for correlated traits compared to single-trait analysis that treated each trait separately, where, for low-heritability traits correlated with high-heritability traits, the prediction accuracy for GBVs was remarkably improved with multi-trait analysis. Although the prediction accuracy with varBayes was lower than that with MCBayes in multi-trait analysis, the rate of loss in accuracy was moderate and the accuracy for correlated low-heritability traits was still higher with varBayes analysis compared to single-trait analysis. Considering the benefit of greatly reduced computational time, varBayes was considered to be a practical method for predicting GBVs in multi-trait genomic selection.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
TH devised Bayesian prediction methods for the genomic selection of multiple traits, developed a program for simulations and drafted the manuscript. HI assisted in developing a program and drafted the final manuscript. Both authors read and approved the final manuscript.
Acknowledgements
This study was supported by the Grant-in-Aid for Scientific Research (B) of Japan Society for the Promotion of Science (Grant No. 22380010).