Animal Breeding and Genomics Centre, ASG Wageningen UR, PO Box 65, 8200 AB Lelystad, The Netherlands

Biosciences Research Division, Department of Primary Industries Victoria, 1 Park Drive, Bundoora 3083, Australia

Melbourne School of Land and Environment, The University of Melbourne, Parkville 3010, Australia

The Cooperative Research Centre for Beef Genetic Technologies, University of New England, Armidale, NSW 2351, Australia

Abstract

Genomic selection describes a selection strategy based on genomic estimated breeding values (GEBV) predicted from dense genetic markers such as single nucleotide polymorphism (SNP) data. Different Bayesian models have been suggested to derive the prediction equation, with the main difference centred around the specification of the prior distributions.

Methods

The simulated dataset of the 13^{th} QTL-MAS workshop was analysed using four Bayesian approaches to predict GEBV for animals without phenotypic information. Different prior distributions were assumed to assess their affect on the accuracy of the predicted GEBV.

Conclusion

All methods produced GEBV that were highly correlated with the true breeding values. The models appear relatively insensitive to the choice of prior distributions for QTL-MAS data set and this is consistent with uniformity of performance of different methods found in real data.

Background

Genomic selection describes a technique for evaluating an animal's breeding value by simultaneously evaluating and summing marker effects across the genome. It uses panels of SNPs covering the whole genome so that ideally all QTL are in linkage disequilibrium with at least one marker, thereby maximizing the proportion of genetic variance explained by the SNPs.

Meuwissen et al (2001)

The aim of this study was to assess the effect that different prior distributions and subsequently the models using these priors, had on the accuracy of estimated GEBV using the 13^{th} QTL-MAS simulated data set where we had no prior knowledge of the trait's distribution of QTL effects.

Methods

Model

At each loci (total number of locus, p) there are three possible combinations of two alleles (e.g. A or B), the homozygote of one allele (AA), the heterozygote (AB) and the homozygote of the other allele (BB). These are then quantitatively represented by 0, 1 and 2 respectively. Subsequently, phenotypic records at each time point were modelled as:

where ** 1** is a vector of ones of length n,

Prior distributions for SNP effects and algorithms

Four differing sets of prior distributions were assessed and the specifications are shown in Table

Prior Distribution Specifications

**Method**

**Prior Distribution**

Bayes BLUP

Bayes A

Bayes A/B (Hybrid)

Bayes C

_{i}

1 - p(_{i}_{i}

_{i}^{th} SNP and _{i}^{th} SNP.

The other two models assumed mixture distributions for the SNP effects reflecting the assumption that there is a large number of SNPs with zero or near zero effects and a second smaller set of SNPs with larger significant effects. A Bayes A/B "hybrid" method was used. This approximation to Bayes B

A faster alternative to both the Bayes A/B hybrid and Bayes B is to use Stochastic Search Variable Selection (SSVS) _{i}^{th} SNP effect is sampled from the larger distribution (i.e. significant effect) or from the small distribution with near zero effects (see Table

The algorithms associated with each model were run for 30,000 iterations with the first 10,000 discarded as burn-in.

Results and Discussion

Prediction of breeding values at time point 600

The problem of how to model the time series data and estimate GEBV at time point 600 was explored. However, there was little information available to estimate any inflection points or asymptotic values. The GEBV estimated at time points 265, 397 and 530 were found to have a linear relationship (eg. appeared to form the linear part of the growth curve). Consequently, as there was no other information available after time point 530 to predict asymptotes etc., the GEBV at time point 600 were estimated by fitting a linear regression through the breeding values at the three linear time points (265, 397 and 530).

Breeding values

The correlations between the GEBV (t=600) predicted by the alternative methods for the validation population containing the 50 full sib families without phenotypes are shown in Table

Correlations Between Estimated GEBV for unphenotyped animals at t=600

**Bayes C**

**Bayes A/B**

**Bayes BLUP**

0.999

0.991

0.860

1

0.993

0.863

1

0.893

Comparison of True and Estimated GEBV.

Bayes.BLUP

0.885

5.479

0.691

0.979

BayesA

0.857

7.092

0.696

1.162

BayesA/B

0.889

5.435

0.73

1.081

BayesC

0.861

6.561

0.71

1.024

Correlation coefficient between the true and predicted GEBV, Mean Square Error (MSE), Rank (Accuracy of the predicting the best 100 animals) and the Regression Coefficient of the true breeding value on the estimated GEBV.

The inclusion of the polygenic effect in the model (not simulated in the data) only slightly reduced the accuracy of prediction (.01) but not significantly (results not shown). It was included in the model as its inclusion has been shown to produce slightly better accuracies of prediction while reducing the bias of the variance components

Bayes BLUP produced a significantly different set of GEBV. This is evident by the much lower correlations with the other methods and difference in regression coefficients between BLUP and the other methods. Despite these differences Bayes BLUP produces good accuracy and a low MSE (Table 3). Hayes et al (2009)

Conclusion

All methods produced GEBV that were highly correlated (greater than 0.85) with the true breeding values despite diverse assumptions and prior distributions. This indicates that the hierarchical model is relatively insensitive to the choice of prior distributions for this data set. Thus all models perform well and this is consistent with the general uniformity of performance found across methods in real data.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

KV carried out the analyses and drafted the manuscript. PB developed the Bayes A and Bayes BLUP software. KV created the Bayes C and Hybrid software using the Bayes A software. BH and MG read and suggested improvements to the manuscript. All authors read and approved the final manuscript.

Acknowledgements

KV was funded by the Marie Curie Host Fellowships for Early Stage Research Training, as part of the 6th Framework Programme of the European Commission. This Publication represents the views of the Authors, not the European Commission, and the Commission is not liable for any use that may be made of the information.

This article has been published as part of BMC Proceedings Volume 4 Supplement 1, 2009: Proceedings of 13th European workshop on QTL mapping and marker assisted selection.

The full contents of the supplement are available online at