Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture of China, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China

Institute of Animal Husbandry and Veterinary Medicine, Anhui Academy of Agricultural Sciences, Hefei 230031, China

Abstract

Background

Genomic breeding value estimation is the key step in genomic selection. Among many approaches, BLUP methods and Bayesian methods are most commonly used for estimating genomic breeding values. Here, we applied two BLUP methods, TABLUP and GBLUP, and three Bayesian methods, BayesA, BayesB and BayesCπ, to the common dataset provided by the 15^{th }QTL-MAS Workshop to evaluate and compare their predictive performances.

Results

For the 1000 progenies without phenotypic values, the correlations between GEBVs by different methods ranged from 0.812 (GBLUP and BayesCπ) to 0.997 (TABLUP and BayesB). The accuracies of GEBVs (measured as correlations between true breeding values (TBVs) and GEBVs) were from 0.774 (GBLUP) to 0.938 (BayesCπ) and the biases of GEBVs (measure as regressions of TBVs on GEBVs) were from 1.033 (TABLUP) to 1.648 (GBLUP). The three Bayesian methods and TABLUP had similar accuracy and bias.

Conclusions

BayesA, BayesB, BayesCπ and TABLUP performed similarly and satisfactorily and remarkably outperformed GBLUP for genomic breeding value estimation in this dataset. TABLUP is a promising method for genomic breeding value estimation because of its easy computation of reliabilities of GEBVs and its easy extension to real life conditions such as multiple traits and consideration of individuals without genotypes.

Background

The goal of genomic selection (GS) ^{th }QTL-MAS Workshop to evaluate and compare their predictive performances.

Methods

Dataset

The common dataset consisted of an outbred population, which had been simulated using the LDSO software

The final dataset used for evaluating genomic selection consisted of 3220 individuals, including 20 sires, 200 dams (each sire mated with 10 dams) and 3000 progenies (15 per dam). All individuals were genotyped for the 9990 SNPs without missing or genotyping error. Of the 15 progenies of each dam, 10 were phenotyped for a continuous trait. The 2000 progenies with phenotypic records and the other 1000 individuals (which had simulated true breeding values) without phenotypic records were treated as reference and validation population, respectively.

Estimation of variance components and EBVs

The variance components and the traditional BLUP EBVs were estimated using phenotypes and pedigree and the software DMUv6

where **y **is the vector of phenotypes of individuals in the reference population, **a **is the vector of additive genetic effects of the phenotyped individuals and their parents, **Z **is the incidence matrix of **a**, and **e **is the vector of residual errors. The variance-covariance matrices of **a **and **e **are

The reliabilities of the traditional EBVs were obtained from DMU directly and calculated as the square of the correlation between EBVs and the true unknown breeding values.

Estimation of SNP effects

BayesA, BayesB and BayesCπ were used to estimate SNP effects in the reference population based on the following model:

where **g **is the vector of random SNP effects, **X **is the matrix of genotype indicators (with values 0, 1, or 2 for genotypes 11, 12, and 22, respectively).

The differences between the three Bayesian methods lay in the assumptions for the prior distribution of SNP effects. BayesA assumes that all SNPs have an effect, but each has a different variance. BayesB and BayesCπ assume that each SNP has either an effect of zero or non-zero with probabilities π and 1-π, respectively, and for those having non-zero effects it is assumed that each SNP has a different variance in BayesB and a common variance in BayesCπ. In addition, in BayesB π is treated as a known parameter, while in BayesCπ it is treated as an unknown parameter with a uniform (0, 1) prior distribution. In this study, we set π = 0.99 for BayesB, and adopted the same prior distributions of **g **and **e **for the three Bayesian methods as those in

The Markov chain was run for 50,000 cycles of Gibbs sampling (for BayesB, 100 additional cycles of Metropolis-Hastings sampling were performed for the SNP effect variance in each Gibbs sampling cycle), and the first 5000 cycles were discarded as burn-in. All the samples of SNP effects after burn-in were averaged to obtain the SNP effect estimate.

Calculation of GEBVs

The genomic estimated breeding values (GEBVs) of all genotyped individuals were obtained using five methods: BayesA, BayesB, BayesCπ, GBLUP and TABLUP.

For BayesA, BayesB and BayesCπ, the GEBV of a genotyped individual was calculated as the sum of all marker effects according to its marker genotypes

For GBLUP and TABLUP, the GEBVs were estimated based on the following model:

where **u **is the vector of genomic breeding values of all genotyped individuals with the variance-covariance matrix equal to

The **G **matrix (realized relationship matrix) was constructed by using genotypes of all markers **TA **matrix (trait-specific marker-derived relationship matrix), was constructed by using genotypes of all markers with each marker being weighted with its estimated effect obtained from BayesB following the rules proposed by Zhang et al.

The accuracies of GEBVs were calculated as the correlation between GEBVs and the simulated true breeding values.

Results and discussion

Variance components

The estimated additive genetic variance and residual variance were 24.82 and 58.65, respectively. Therefore, the estimated heritability was 0.30. These estimates were used for the subsequent estimation of SNP effects and GEBVs.

Estimates of SNP effects

Figure

Absolute values of estimated SNP effects by BayesA (A), BayesB (B) and BayesCπ (C)

**Absolute values of estimated SNP effects by BayesA (A), BayesB (B) and BayesCπ (C)**.

Peak positions of profiles of the estimated SNP effects and the corresponding estimated SNP effects

**Method**

**Chr. 1**

**Chr. 2**

**Chr. 3**

**Chr. 4**

**Pos**.

**Effect**

**Pos**.

**Effect**

**Pos**.

**Effect**

**Pos**.

**Effect**

BayesA

59

5.19±0.37

3660

1.01±0.90

4094

2.25±0.40

3914

0.35±0.73

BayesB

59

1.96±2.13

3660

0.73±0.82

4092

0.91±1.17

3873

0.56±0.65

BayesCπ

58

5.15±0.42

3660

0.93±0.96

4092

2.50±0.76

7234

0.53±1.51

3873

0.76±0.75

4331

0.41±0.67

Simulated QTL

57

3638

4100

6644

3875

4300

Correlations between GEBVs by different methods and between EBVs and GEBVs for the 20 sires

For the 20 sires, the reliability of traditional EBVs was 0.95. Table

Correlations between GEBVs by different methods (the first 4 columns) and between traditional EBVs and GEBVs (the last column) for the 20 sires

**BayesB**

**BayesCπ**

**TABLUP**

**GBLUP**

**Traditional EBV**

BayesA

0.999

0.995

0.995

0.972

0.942

BayesB

0.992

0.998

0.978

0.947

BayesCπ

0.986

0.956

0.933

TABLUP

0.986

0.952

GBLUP

0.966

Correlations between GEBVs by different methods for the 1000 progenies without phenotypic values

Table

Correlations between GEBVs by different methods for the 1000 progenies without phenotypic values.

**BayesB**

**BayesCπ**

**TABLUP**

**GBLUP**

BayesA

0.991

0.985

0.983

0.841

BayesB

0.986

0.997

0.860

BayesCπ

0.976

0.812

TABLUP

0.876

Accuracies and biases of GEBVs

The availability of true breeding values (TBVs) of the 1000 progenies without phenotypic values allowed a more efficient assessment for methods. Table

Accuracies and biases of GEBVs for the 1000 progenies without phenotypic values.

**Method**

**
r
**

**
b
**

BayesA

0.924

1.063

BayesB

0.933

1.068

BayesCπ

0.938

1.057

TABLUP

0.924

1.033

GBLUP

0.774

1.648

TABLUP is an improvement of GBLUP in the way that the **G **matrix is replaced with **TA **matrix. In construction of the **TA **matrix, not only the marker genotypes, but also the marker effects are taken into account. The advantage of the **TA **matrix over the **G **matrix is that it not only accounts for the Mendelian sampling term, but also puts greater weight on loci explaining more of genetic variance for the trait of interest. This makes TABLUP more accurate than GBLUP. On the other hand, although TABLUP and the Bayesian methods gave similar accuracies, TABLUP has two important features that Bayesian methods lack. The first is that the reliability of an individual's GEBV can be calculated by TABLUP through the method outlined for GBLUP by VanRaden

Conclusions

BayesA, BayesB, BayesCπ and TABLUP performed similarly and satisfactorily and remarkably outperformed GBLUP for genomic breeding value estimation in this dataset. TABLUP is a promising method for genomic breeding value estimation because of its easy computation of reliabilities of GEBVs and its easy extension to real life conditions such as multiple traits and consideration of individuals without genotypes.

List of abbreviations used

QTL: quantitative trait locus; MAS: marker assisted selection; GS: genomic selection; BLUP: best linear unbiased prediction; GBLUP: BLUP with a realized relationship matrix; TABLUP: BLUP with a trait specific relationship matrix; EBV(s): estimated breeding value(s); GEBV(s): genomic estimated breeding value(s); TBV(s): true breeding value(s); SNP: single nucleotide polymorphism.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

CLW, PPM and ZZ contributed the data analyses and the manuscript. XDD and JFL contributed the modification of manuscript. WXF and ZQW carried out the data analyses. QZ coordinated the analyses and revised the manuscript. All authors have read and contributed to the final text of the manuscript.

Acknowledgements

This work was supported by the State High-Tech Development Plan of China (Grant No. 2008AA101002, 2011AA100302), the National Natural Science Foundation of China (Grant No. 30800776, 30972092, 31171200), Beijing Municipal Natural Science Foundation (Grant No. 6102016), and the Modern Pig Industry Technology System Program of Anhui Province.

This article has been published as part of