Department of Molecular Biology and Genetics, Faculty of Science and Technology, Aarhus University, DK-8830 Tjele, Denmark

Department of Animal Science, Faculty of Agriculture, Ferdowsi University of Mashhad, 91775 Mashhad, Iran

Abstract

Background

In genomic models that assign an individual variance to each marker, the contribution of one marker to the posterior distribution of the marker variance is only one degree of freedom (df), which introduces many variance parameters with only little information per variance parameter. A better alternative could be to form clusters of markers with similar effects where markers in a cluster have a common variance. Therefore, the influence of each marker group of size

Methods

The simulated data from the 15^{th }QTL-MAS workshop were analyzed such that SNP markers were ranked based on their effects and markers with similar estimated effects were grouped together. In step 1, all markers with minor allele frequency more than 0.01 were included in a SNP-BLUP prediction model. In step 2, markers were ranked based on their estimated variance on the trait in step 1 and each 150 markers were assigned to one group with a common variance. In further analyses, subsets of 1500 and 450 markers with largest effects in step 2 were kept in the prediction model.

Results

Grouping markers outperformed SNP-BLUP model in terms of accuracy of predicted breeding values. However, the accuracies of predicted breeding values were lower than Bayesian methods with marker specific variances.

Conclusions

Grouping markers is less flexible than allowing each marker to have a specific marker variance but, by grouping, the power to estimate marker variances increases. A prior knowledge of the genetic architecture of the trait is necessary for clustering markers and appropriate prior parameterization.

Background

The statistical methods for genomic selection introduced by Meuwissen et al

In BayesA and BayesB the prior distribution of each marker effect is assumed normal with a marker specific variance

Methods

Model

First, an animal model BLUP using pedigree and phenotypes was performed to predict breeding values of all animals, and REML estimate of heritability was obtained

The SNP-BLUP additive genomic model was used to estimate SNP effects in the first step as:

where _{j }_{ij }_{j }_{i}**e**_{i }

The estimated marker effect _{j }

where _{jk }

The fully conditional posterior distributions were as follows:

_{k }_{k }

In further analyses, respectively, 1500 (SNP1500) and 450 (SNP450a, SNP450b) markers with the largest effects from model (2) were selected and were allocated to groups of size 150 (for 1500 markers), and 75 or 50 (for 450 markers). Then, breeding values of animals without records were predicted using the marker effects from these subsets of markers.

Gibbs sampling

Gibbs sampling was used to sample from joint posterior distributions for all datasets. The chain length was 50.000 in all analyses where the first 20.000 samples were discarded as burn in and one of each 30 samples were saved to compute the posterior means for the parameters. Preliminary experience showed that a burn in of size 20.000 guarantees the convergence for different parameters.

Results and discussion

The challenge was to predict the breeding values of 1000 genotyped animals with no phenotypes. The available data comprised of 2000 animals with both genotype and phenotype. We validated the models using 200 animals such that the last progeny with genotype and phenotype record from each dam was taken out of the data and used for validation. The remaining 1800 animals were used to train the model. Based on this validation the size of the SNP-groups was chosen to be 150 and the scale and df of the prior distribution of marker variances were set to zero because this resulted in the highest accuracy for the validation animals. A scale and df setting of zero corresponds to the so-called Jeffreys or non-informative prior for variances. After the true breeding values of the other 1000 animals were provided, it turned out that other prior specification for the marker variances can give better predictive abilities. Perhaps, the reason was that the 200 validation animals were not enough to represent the whole population. Further, there was an imprinted QTL where the effect is expressed if it has been transmitted from one of the parents only. It is likely that among these 200 animals most of them or all of them have got the paternal (maternal) imprinted QTL. Further details of the impact of prior specification of the marker variances on estimation of SNP effects and breeding values will be discussed.

Accuracy of predicted breeding values

The accuracies of predicted breeding values (PBV) from the two-step method were higher than PBV from animal model BLUP and SNP-BLUP (Table

Correlation between predicted breeding values of unphenotyped animals and their true genetic values or expected genetic values of their progeny

**Method**

**Genetic value**

**Progeny value**

BLUP

0.608

0.595

SNP-BLUP

0.825

0.822

All_SNP^{1}

0.862

0.841

SNP1500^{1}

0.861

0.840

SNP450a^{2}

0.856

0.830

SNP450b^{3}

0.854

0.823

^{1, 2, 3 }Group sizes were 150, 75 and 50, respectively.

Variance components and heritability

Table

Estimates of heritability from different methods

**Method**

**Heritability**

True

0.300

REML

0.297

SNP-BLUP

0.289

All_SNP^{1}

0.355

SNP1500^{1}

0.355

SNP450a^{2}

0.357

SNP450b^{3}

0.356

^{1, 2, 3 }Group sizes were 150, 75 and 50, respectively.

Prior distribution for variances

Overestimation of the heritability in the two-step method was mainly due to the prior setting for the SNP-group variances (scale = 0, df = 0, corresponding to the Jeffreys or non-informative prior). In order to investigate the effect of prior df, two other analyses with either 50 or 150 degrees of freedom were run with all markers (extensions of ALL-SNP), where, the scale parameter was updated using equation (3). Figure

Marker-group heritabilities for different prior degrees of freedom for the group variances

**Marker-group heritabilities for different prior degrees of freedom for the group variances**. Grouping was such that the first group was consisted of markers with largest effects in the SNP-BLUP analysis and similarly the last group was consisted of markers with the smallest effects on the trait.

Conclusions

Grouping markers is less flexible than allowing each marker to have a specific marker variance but, by grouping, the power to estimate marker variances increases. A prior knowledge of the genetic architecture of the trait is necessary in order to clustering markers and appropriate prior parameterization. In the workshop data set, the presented approach to group SNPs gave better predictions than a SNP-BLUP model, but worse predictions than a mixture (BayesB type) model. However, the workshop data set had a limited amount of QTL, which may not be representative for many real data sets. In real data often little advantages are seen for mixture models compared to SNP-BLUP, and as our method clearly outperformed SNP-BLUP our method could be of interest for further study in real data.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

MMS analyzed the data and drafted the paper. All authors contributed in planning the study, discussing the results and reading and editing the paper.

Acknowledgements

MMS was funded by The Danish Research Agency grant no. 274-08-0068.

This article has been published as part of