Genus plc., 100 Bluegrass Commons Blvd., Suite 2200, Hendersonville, TN, 37075, USA

North Carolina State University, Department of Animal Science, Raleigh, NC, 27695-7627, USA

Abstract

Background

Bayesian approaches for predicting genomic breeding values (GEBV) have been proposed that allow for different variances for individual markers resulting in a shrinkage procedure that uses prior information to coerce negligible effects towards zero. These approaches have generally assumed application to high-density genotype data on all individuals, which may not be the case in practice. In this study, three approaches were compared for their predictive power in computing GEBV when training at high SNP marker density and predicting at high or low densities: the well- known

Results

The GEBV accuracy (calculated as correlation between GEBV and traditional breeding values) was highest for

Conclusions

In this dataset the

Background

A number of approaches have recently been proposed for the prediction of genomic breeding values for high-density single nucleotide polymorphism (SNP) panels. Methods commonly used fall into two categories,^{2} prior is assigned to SNP variances. Scale and degrees of freedom of the distribution are in this case set as hyperparameters and samples of the posterior distribution are obtained through MCMC methods. A generalization in which the hyperparameters regulating the shrinkage are treated as unknown parameters and estimated from the data leads to the well known

This study investigated the predictive performance of different Bayesian hierarchical approaches,

Methods

The dataset used for analysis was simulated as part of the 13^{th} QTL-MAS Workshop, see

Prediction of phenotypes and breeding values

The simulated dataset included phenotypes for five traits representing measures of yield at five different time points (t0, t132, t265, t397 and t530). A sixth phenotype was predicted to represent yield at a time point beyond the simulated data, time point 600 (t600). A number of non-linear models were tested to predict t600

Description of models

The data were analyzed using three different approaches, considering additive genetic effects only. The general structure of the models in matrix form was:

where **y** is the vector of phenotypic effects, **µ** is the overall mean, **β****X****e** is a vector of residuals assumed N(0,

for the j^{th} SNP,

for the

for the^{2}^{2}

_{j} from N

^{2} (

_{j}

^{2} |

^{2}^{2}

^{2} from Gamma^{2}

^{2},

The Gibbs sampling algorithm for all three methods was implemented in R

Genomic breeding values were calculated for all individuals in the prediction set, for t530 and t600 by:

where **X _{m}**

Low-density marker subsets

Subsets of the prediction dataset were created to simulate the situation where training can be done at high density, but prediction of GEBV occurs with a lower density panel. In this case the full training set, including 1000 individuals and 453 SNPs, was used to estimate the SNP effects, but GEBV were calculated using either a smaller subset of SNPs or a combination of genotypes for a small subset and genotype probabilities for the remaining markers (see Table

Number of SNPs included in the calculation of genomic breeding values in each low-density scenario

**Scenario**

**Evenly-spaced ^{a}**

**Largest effects ^{b}**

**Genotype
**

**Total**

**probabilities ^{c}**

EVEN_19

19

19

EVEN_38

38

38

EVEN_76

76

76

SIG_19

19

19

SIG_38

38

38

SIG_76

76

76

EVEN_GP_19

19

434

453

EVEN_GP_38

38

415

453

EVEN_GP_76

76

377

453

SIG_GP_19

19

434

453

SIG_GP_38

38

415

453

SIG_GP_76

76

377

453

^{a}Selected by taking the ^{th} SNPs from ordered list and thus SNP were approximately evenly-spaced

^{b}Selected by taking the top

^{c}Genotype probabilities were used in place of actual genotypes for all SNPs that don't fall into one of the other categories, within a scenario

Genomic breeding values were calculated for the marker subsets as above by:

In this case, the individual element (** X** is calculated as:

where

Results

Accuracies of the GEBV were calculated for each of the three approaches (

Correlations between genomic breeding values and breeding values from a traditional animal model for animals in the prediction set (without phenotypes) and coefficients of regression of traditional on genomic breeding values, for t530 and t600.

**t530**

**t600**

**Method**

**Corr.**

**b**

**Corr.**

**B**

0.673

0.893

0.674

0.880

0.718

1.019

0.720

1.010

0.736

1.061

0.737

1.072

Correlations of GEBV for each low-density SNP scenario (Table

Correlations between genomic breeding values and breeding values from different low SNP-density approaches (and change in correlation compared to original full marker model), where all SNP effects are estimated in the same high SNP-density training set, for t530 and t600.

**t530**

**t600**

**Scenario**

**
Bayes-A
**

**
Student-t
**

**
Lasso
**

**
Bayes-A
**

**
Student-t
**

**
Lasso
**

EVEN_19

0.255

0.142

0.195

-0.128

0.098

0.173

(-0.418)

(-0.846)

(-0.594)

(-0.532)

(-0.622)

(-0.564)

EVEN_38

0.481

0.494

0.528

0.469

0.485

0.522

(-0.192)

(-0.249)

(-0.242)

(-0.180)

(-0.235)

(-0.215)

EVEN_76

0.490

0.544

0.586

0.472

0.532

0.584

(-0.183)

(-0.246)

(-0.192)

(-0.130)

(-0.188)

(-0.153)

SIG_19

0.663

0.699

0.709

0.669

0.692

0.709

(-0.010)

(-0.049)

(-0.037)

(0.025)

(-0.028)

(-0.028)

SIG_38

0.664

0.703

0.713

0.669

0.707

0.721

(-0.009)

(-0.049)

(-0.033)

(0.029)

(-0.013)

(-0.016)

SIG_76

0.667

0.709

0.711

0.672

0.712

0.729

(-0.006)

(-0.046)

(-0.027)

(0.035)

(-0.008)

(-0.008)

EVEN_GP_19

0.937

0.967

0.980

0.928

0.967

0.978

(0.264)

(0.210)

(0.231)

(0.293)

(0.247)

(0.241)

EVEN_GP_38

0.733

0.785

0.861

0.736

0.789

0.862

(0.060)

(0.018)

(-0.049)

(0.111)

(0.069)

(0.125)

EVEN_GP_76

0.733

0.786

0.854

0.736

0.789

0.856

(0.060)

(0.018)

(-0.050)

(0.112)

(0.069)

(0.119)

SIG_GP_19

0.674

0.730

0.802

0.675

0.735

0.798

(0.001)

(0.043)

(-0.006)

(0.056)

(0.015)

(0.061)

SIG_GP_38

0.673

0.728

0.783

0.675

0.731

0.791

(0)

(-0.043)

(-0.008)

(0.054)

(0.011)

(0.054)

SIG_GP_76

0.673

0.724

0.767

0.674

0.729

0.769

(0)

(-0.044)

(-0.012)

(0.050)

(0.009)

(0.032)

Discussion

The three methods applied to the simulated data performed similarly (Table ^{2}

SNP effects estimated by Bayes-A, Student-t and Lasso for t600, by genome location (cM).

**SNP effects estimated** by

The use of low-density SNP subsets is based on the concept of Habier

The scenarios using genotype probabilities performed well and in most cases showed a small or no reduction in accuracy, compared to using the full marker set. Due to the population structure (full and half-sib families) and completeness of parental genotypes it is expected that the genotype probabilities are a good representation of the true genotypes in this case. In a situation where there are fewer ties between individuals the advantage of using genotype probabilities (in place of actual genotypes) is likely to be lower than what was found in this study. A number of the scenarios even showed large increases in accuracy to unrealistic levels (e.g., EVEN_GP_19, Table

Epilogue

The availability of true breeding values (TBV) allowed for an improved evaluation of the effectiveness of the three analysis methods on alternative marker sets (Table

Accuracy of genomic breeding values using three methods, as the correlation between true and predicted breeding values, for animals in the prediction set using all markers (ALL) and using alternative low-density approaches, for t600.

**Scenario**

**
Bayes-A
**

**
Student-t
**

**
Lasso
**

ALL

0.916

0.945

0.916

EVEN_19

0.040

0.206

0.258

EVEN_38

0.732

0.738

0.738

EVEN_76

0.734

0.761

0.758

SIG_19

0.913

0.931

0.910

SIG_38

0.915

0.938

0.914

SIG_76

0.915

0.943

0.921

EVEN_GP_19

0.658

0.674

0.671

EVEN_GP_38

0.833

0.84

0.817

EVEN_GP_76

0.834

0.846

0.825

SIG_GP_19

0.914

0.937

0.914

SIG_GP_38

0.915

0.940

0.917

SIG_GP_76

0.916

0.943

0.920

Conclusions

For this simulated dataset the

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

MAC performed analyses, participated in study design and drafted the manuscript. SF performed analyses and participated in study design. ND participated in study design and helped to interpret results. CM developed the methods for effect estimation, performed analyses, participated in study design and helped draft the manuscript. All authors read and approved the final manuscript.

Acknowledgement

This article has been published as part of BMC Proceedings Volume 4 Supplement 1, 2009: Proceedings of 13th European workshop on QTL mapping and marker assisted selection.

The full contents of the supplement are available online at