The Roslin Institute and R(D)SVS, University of Edinburgh, Roslin, Midlothian, EH25 9PS, UK

Abstract

Background

We used the Gompertz growth curve to model a simulated longitudinal dataset provided by the QTLMAS2009 workshop and applied genomic evaluation to the derived model parameters and to a model-predicted trait value.

Results

Prediction of phenotypic information from the Gompertz curve allowed us to obtain genomic breeding value estimates for a time point with no phenotypic records. Despite that the true model used to simulate the data was the logistic growth model, the Gompertz model provided a good fit of the data. Genomic breeding values calculated from predicted phenotypes were highly correlated with the breeding values obtained by directly using the respective observed phenotypes. The accuracies between the true and estimated breeding value at time 600 were above 0.93, even though t600 was outside the time range used when fitting the data. The analysis of the parameters of the Gompertz curve successfully discriminated regions with QTL affecting the asymptotic final value, but it was less successful in finding QTL affecting the other parameters of the logistic growth curve. In this study we estimated the proportion of SNPs affecting a given trait, in contrast with previously reported implementations of genomic selection in which this parameter was assumed to be known without error.

Conclusions

The two-step approach used to combine curve fitting and genomic selection on longitudinal data provided a simple way for combining these two complex tasks without any detrimental effect on breeding value estimation.

Background

A longitudinal trait is a composite of phenotypes recorded over time which have a complex genetic correlation structure. Different types of non-linear functions have been used to model a time-dependent trait and dissect its genetic components. For instance, the Gompertz model has been used for analysing the polygenic components

The objective of this study was to estimate genomic breeding values for the trait at time 600 (t600), which resided outside the range of longitudinal yield data provided by the QTLMAS2009 workshop. We implemented a two-step procedure in which first the Gompertz function was fitted to the data for each individual and, then genomic selection was performed on the predicted phenotype at t600 and on the parameter estimates derived from the fitted Gompertz curve.

Methods

Data

The data provided by QTLMAS2009 is fully described in

Procedure

To obtain genomic breeding values for t600, we used an approach composed by two independent steps: Firstly, a Gompertz growth curve was used to model the performance records across time, and to estimate the model descriptors (A, B, C) which best fit the phenotypes of each individual. Secondly, genomic evaluation was applied to obtain genomic estimated breeding values (GEBVs) for t600 using two different methods: I) estimating GEBVs for the model parameters (A, B, C; i.e. 3 GEBVs per individual) and using them to estimate the breeding value for t600 from the Gompertz function; II) predicting the phenotypes at t600 from evaluating the Gompertz function with the estimated parameters and later applying genomic selection on the predicted t600 phenotypes.

Growth model

The Gompertz equation is of the form: y(t) = Ae^{{-e[Be(C-t)/A]}}, where y(t) is the yield at time t; A the final yield; B the maximum growth rate and C the age at maximum growth rate. The curve fitting was implemented using nonlinear regression in SAS

Genomic evaluation

A Bayes B type of analysis was used as first described by Meuwissen

where **y** is the vector of phenotypes; **b** contains the fixed effects and **X** is its incidence matrix; α_{i} is the allelic substitution effect for SNP **g**_{i} is the vector of genotypes (1, 2 & 3 for genotypes 00, 10/01 and 11, respectively) for SNP **e** the vector of residuals distributed N(0,

The implementation of the model was done using Gibbs sampling. The parameters

For each analysis, a MCMC chain was run and the first 10000 cycles were discarded as burn-in period. Following this, 10000 realisations were collected, each separated by 50 cycles between consecutive realisations. The posterior mean was used as the estimate for each parameter of interest.

Results and discussion

Growth model parameters

The Gompertz model provided a good fit of the data (see additional files

Estimated means and standard errors for Gompertz model parameters and predicted weight at point 600

Non-linear distribution of yield of individuals across time (dots) and the average growth curve obtained after fitting the Gompertz model to the yield data (solid line).

Estimation of GEBVs for the parameters of the Gompertz curve

Univariate analyses were performed to each of the three parameters of the Gompertz function. The correlations between the univariate GEBVs for the three parameters were high (correlations between GEBVs for A-B, A-C and B-C were 0.97, 0.71 and 0.59, respectively. The posterior means of π for A, B and C were 0.059, 0.082 and 0.219, respectively. The posterior probabilities for the SNPs having an effect on the parameters A, B and C, and their estimated allelic substitution effects are shown in Figures

Probability of an individual SNP affecting the parameters A, B or C.

Probability of an individual SNP affecting the parameters A, B or C.

Estimated allele substitution effect for A, B and C. The size of the effects are rescaled relative to the largest allele effect within each parameter (i.e. highest effect=1).

Estimated allele substitution effect for A, B and C. The size of the effects are rescaled relative to the largest allele effect within each parameter (i.e. highest effect=1).

Estimation of GEBVs for the trait at a given time point

The GEBVs for t600 obtained by evaluating the Gompertz function with GEBVs for A, B and C (method I) were very similar to those calculated from method II which evaluated the predicted performance at t600 (see Figure

Pearson (lower diagonal) and Spearman (upper diagonal) correlations between true and estimated breeding values for t600 and the parameters used to simulate or analyse the data.^{12}

Scatter plot of t600 GEBVs calculated from evaluating the Gompertz function using GEBVs of A, B, C (x-axis) and calculated from Genomic selection on the predicted phenotype at t600 (y-axis). The GEBV not scaled on the same mean.

Scatter plot of t600 GEBVs calculated from evaluating the Gompertz function using GEBVs of A, B, C (x-axis) and calculated from Genomic selection on the predicted phenotype at t600 (y-axis). The GEBV not scaled on the same mean.

Comparison with the true model used to simulate the data

The true model used to simulate the data was the logistic growth curve (see _{1} have similar definition. The equivalence between the other parameters is less clear, which explains the results obtained. GEBVs for A, B were highly correlated to the true values for _{1}, but poorly with _{2} and _{3} (Table _{1}were simulated. Locations containing QTL for parameters _{2} and _{3} were less associated with QTL affecting the parameters of the Gompertz curve.

Pearson (lower diagonal) and Spearman (upper diagonal) correlations between true and estimated breeding values for t600 and the parameters used to simulate or analyse the data.^{1,2}

**TBV**

**GEBV**

**T600**

**Φ _{1}**

**Φ _{2}**

**Φ _{3}**

**t600_I**

**t600_II**

**A**

**B**

**C**

**TBV t600**

0.995

0.230

0.091

0.935

0.937

0.913

0.930

0.405

**TBV Φ _{1}**

0.997

0.285

0.160

0.931

0.937

0.928

0.925

0.465

**TBV Φ _{2}**

0.291

0.344

0.129

0.237

0.258

0.377

0.306

0.719

**TBV Φ _{3}**

0.098

0.157

0.108

0.082

0.112

0.213

0.029

0.463

**GEBV t600_I**

0.942

0.941

0.316

0.079

0.990

0.968

0.979

0.402

**GEBV t600_II**

0.947

0.949

0.332

0.116

0.990

0.969

0.981

0.437

**GEBV A**

0.919

0.933

0.459

0.194

0.970

0.969

0.957

0.599

**GEBV B**

0.938

0.940

0.396

0.034

0.983

0.983

0.971

0.454

**GEBV C**

0.519

0.571

0.735

0.433

0.523

0.551

0.709

0.587

^{1}**TBV t600, Φ _{1}, Φ_{2} and Φ_{2}**: true breeding values for t600, and parameters

^{2}The results shown here are higher than those presented to the QTLMAS workshop since an error in the implementation was found afterwards. Values presented at the workshop were 0.907 and 0.911 for the Pearson correlations of method I and II, and 0.886 and 0.891 for the Spearman correlations for methods I and II.

Probability of the SNP having an effect (A) and allele substitution (B) for the performance at time 600.

Despite that the Gompertz curve was not the true model, its use provided very accurate GEBVs for t600. The correlation between the true breeding values and GEBVs are presented in Table

In this study, the proportion of SNPs affecting the trait, π, was estimated in the analysis. This contrasts with previously reported implementations of Bayes B where π was assumed to be known without error. The π values were slightly overestimated, partly due to the low linkage disequilibrium between SNPs (average r^{2} between consecutive SNP was 0.15) and also to the fact that the Gompertz function was not the true model. However, the success in estimating such an important parameter from the data itself, even when assuming a uniform prior, provides an improvement in genomic evaluation relative to assuming that π is known without error.

Conclusions

The two-step approach of growth model fitting and genomic selection on model parameters and on predicted phenotype appeared to be a simple and reliable strategy. Despite that the Gompertz curve was not the true model used to simulate the data, the correlations between true and estimated breeding values at t600 were very high (Pearson and Spearman correlations above 0.93). The approach of estimating GEBVs for phenotype at a time of interest using GEBVs of the three parameters and evaluating the Gompertz function could be beneficial when GEBVs are needed for different time points. In this study, the proportion of SNP affecting the trait was estimated from the data, contrasting with previous implementation of genomic selection where this proportion has been assumed to be known without error. The results from this study showed that separate implementation of the growth modelling process and genomic evaluation provided huge simplification of the methodology with no detrimental effect on the final results.

List of abbreviations used

QTL: Quantitative Trait Locus; SNP: Single Nucleotide Polymorphism; GS: Genomic Selection; GEBV: Genomic Estimated Breeding value; MCMC: Monte Carlo Markov Chain

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

RPW and GH carried out the analyses and drafted the manuscript. Both authors have read and contributed to the final text of the manuscript.

Acknowledgements

GH acknowledges the GENACT Project, funded by the Marie Curie Host Fellowships for Early Stage Research Training, as part of the 6^{th} Framework Programme of the European Commission. This work has made use of the resources provided by the Edinburgh Compute and Data Facility (ECDF) (

This article has been published as part of BMC Proceedings Volume 4 Supplement 1, 2009: Proceedings of 13th European workshop on QTL mapping and marker assisted selection.

The full contents of the supplement are available online at