The Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden
Statistics Unit, Dalarna University, Borlänge, Sweden
Department of Animal Breeding & Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden
Abstract
Background
Genomewide dense markers have been used to detect genes and estimate relative genetic values. Among many methods, Bayesian techniques have been widely used and shown to be powerful in genomewide breeding value estimation and association studies. However, computation is known to be intensive under the Bayesian framework, and specifying a prior distribution for each parameter is always required for Bayesian computation. We propose the use of hierarchical likelihood to solve such problems.
Results
Using double hierarchical generalized linear models, we analyzed the simulated dataset provided by the QTLMAS 2010 workshop. Markerspecific variances estimated by double hierarchical generalized linear models identified the QTL with large effects for both the quantitative and binary traits. The QTL positions were detected with very high accuracy. For young individuals without phenotypic records, the true and estimated breeding values had Pearson correlation of 0.60 for the quantitative trait and 0.72 for the binary trait, where the quantitative trait had a more complicated genetic architecture involving imprinting and epistatic QTL.
Conclusions
Hierarchical likelihood enables estimation of markerspecific variances under the likelihoodist framework. Double hierarchical generalized linear models are powerful in localizing major QTL and computationally fast.
Background
Genetic analyses in livestock studies are generally based on information from pedigrees and molecular markers. Traditionally, a kinship matrix can be calculated using the pedigree data, which can be used in a
Dense marker genotypes along genome can now be affordably obtained due to new and efficient methods for typing
The aim of this paper is to map QTL and report GEBV for the simulated dataset provided by QTLMAS 2010 workshop. We employ a unified analysis via the
Methods
Data
The dataset used in this paper was simulated for the QTLMAS 2010 workshop (Poznań, Poland). A pedigree consisting of 3226 individuals in 5 generations (
Models
DHGLM provides a unified analysis for both QTL mapping and genomic breeding value estimation. Similar to BayesA, the data are modeled on two levels, i.e. both the phenotypic mean and the variance are modeled with random effects. For a quantitative trait, the phenotype y (
y = X
where g ~
log λ = 1
with an intercept
For the markerspecific variances, the correlated random effects, b, follow a multivariate normal distribution with a mean of zero and a variancecovariance matrix
The overall phenotypic variance can be expressed as
where
Fitting algorithm
According to the extended likelihood principle, inference of the random SNP effects g should be drawn through the
• Solve the following WLS problem for
Where
• Update
• Solve the following WLS problem for
where
• Update
Results and Discussion
Estimation of SNP effects
The effect of each SNP was estimated by a smoothed DHGLM with spatial correlation parameter
Estimated SNP effects
Estimated SNP effects The SNP effects were estimated using the smoothed DHGLM with spatial correlation parameter
QTL mapping
Moving from the mean part to the variance (dispersion) part of the models, markerspecific variances were estimated and used to detect QTL (Figure
QTL detection using estimated markerspecific variances
QTL detection using estimated markerspecific variances The markerspecific variances were estimated using the smoothed DHGLM with spatial correlation parameter
Estimated heritability of the detected QTL and suggestive QTL for QT and BT.
Chromosome
Position (bp)
QTL
1
8396357
0.0106
0.0957
1
49965266
0.1096

2
32741451
0.0167

2
95418368
0.0177

3
22590128
0.0606
0.1101
3
71794627
0.0589

Suggestive QTL
1
49965266

0.0859
2
79212967
0.0093

2
95418368

0.0096
3
4590043
0.0109

3
39652617
0.0092

3
84974466

0.0066
4
1456752

0.0265
Sum
0.3035
0.3342
GEBV
GEBV were estimated for all the 3226 individuals in the pedigree. Examining outsample prediction, we compare the GEBV with the true breeding values (TBV) for the young individuals (23273226) without phenotypic records (Figure
Scatterplots of GEBV against TBV for the young individuals without phenotypic records
Scatterplots of GEBV against TBV for the young individuals without phenotypic records The GEBV were estimated using the smoothed DHGLM with spatial correlation parameter
Conclusions
DHGLM were shown to be an efficient and reliable approach for both QTL mapping and genomic selection. Since DHGLM can be estimated by iterating interlinked GLMs, the execution time is greatly shortened comparing to the Bayesian computation. On a Macintosh laptop with a 2 GHz processor and 4 GB memory (1067 MHz), it took about 1020 minutes, depending on starting values, to obtain our results using our implementation in R. No priors are required for parameters in DHGLM. Main QTL mapped via DHGLM showed very good accuracy though some QTL with small effects were shrunk or smoothed down. An R package iQTL has been implemented and is available on RForge:
List of abbreviations used
bp: base pair; DHGLM: double hierarchical generalized linear model; DNA: deoxyribonucleic acid; GEBV: genomic estimated breeding values; GLM: generalized linear model; GLMM: generalized linear mixed model; GWA: Genomewide association;
Competing interests
No competing interest to declare by any of the authors.
Authors contributions
XS, LR and ÖC initiated the study. XS analyzed the simulated common dataset of the QTLMAS 2010 workshop and drafted the paper. LR initiated the smoothed version of double hierarchical generalized linear models. XS, LR and ÖC worked on the revision together and approved the final manuscript.
Acknowledgements
Xia Shen is funded by a Future Research Leaders grant from the Swedish Foundation for Strategic Research (SSF) to Örjan Carlborg. Lars Rönnegård is funded by the Swedish Research Council for Environment, Agricultural Sciences and Spatial Planning (FORMAS). François Besnier is acknowledged for sharing his IBD calculation program to validate our results by variance component methods.
This article has been published as part of