Abstract
Background
Molecular marker information is a common source to draw inferences about the relationship between genetic and phenotypic variation. Genetic effects are often modelled as additively acting marker allele effects. The true mode of biological action can, of course, be different from this plain assumption. One possibility to better understand the genetic architecture of complex traits is to include intralocus (dominance) and interlocus (epistasis) interaction of alleles as well as the additive genetic effects when fitting a model to a trait. Several Bayesian MCMC approaches exist for the genomewide estimation of genetic effects with high accuracy of genetic value prediction. Including pairwise interaction for thousands of loci would probably go beyond the scope of such a sampling algorithm because then millions of effects are to be estimated simultaneously leading to months of computation time. Alternative solving strategies are required when epistasis is studied.
Methods
We extended a fast Bayesian method (fBayesB), which was previously proposed for a purely additive model, to include nonadditive effects. The fBayesB approach was used to estimate genetic effects on the basis of simulated datasets. Different scenarios were simulated to study the loss of accuracy of prediction, if epistatic effects were not simulated but modelled and vice versa.
Results
If 23 QTL were simulated to cause additive and dominance effects, both fBayesB and a conventional MCMC sampler BayesB yielded similar results in terms of accuracy of genetic value prediction and bias of variance component estimation based on a model including additive and dominance effects. Applying fBayesB to data with epistasis, accuracy could be improved by 5% when all pairwise interactions were modelled as well. The accuracy decreased more than 20% if genetic variation was spread over 230 QTL. In this scenario, accuracy based on modelling only additive and dominance effects was generally superior to that of the complex model including epistatic effects.
Conclusions
This simulation study showed that the fBayesB approach is convenient for genetic value prediction. Jointly estimating additive and nonadditive effects (especially dominance) has reasonable impact on the accuracy of prediction and the proportion of genetic variation assigned to the additive genetic source.
1 Background
Molecular marker information is commonly used to draw inferences about the relationship between genetic and phenotypic variation in various species, e.g. humans [1], dairy cattle [2] or mice [3]. Assuming linkage disequilibrium (LD) between quantitative trait loci (QTL) and markers, genetic effects can be estimated and explained as QTL effects captured by the neighbouring markers. If breeding values are the focal point, genetic effects are typically modelled as additively acting marker allele effects (e.g. [4,5]). The mode of biological action can, of course, be different from the assumption of pure additivity. One possibility to better understand the genetic architecture of complex traits is to include intralocus (dominance) and interlocus (epistasis) interaction of alleles when fitting a model to a trait. The importance of nonadditive effects for genetic variation has recently been investigated. Knowledge about nonadditive effects is essential to benefit, for example, from heterosis effects [6], especially for crossbreeding schemes (poultry, plants etc.). In general, it can be expected that the prediction of the genetic value, in particular its additive part, is improved if nonadditive effects are additionally modelled. For instance, Lee et al. [7] reported that the accuracy of prediction increased considerably when dominance effects were included compared to a purely additive genetic model when the phenotypes coat colour (+17% accuracy) or the percentage of CD8^{+ }cells (+2% accuracy) were studied in mice. Added epistasis did not, however, contribute to the accuracy in this case. In an example with recombinant inbred lines of soybean [8], the accuracy of prediction was more than doubled under the epistatic model. Even though nonadditive effects may occur on the level of gene action, most of the genetic variation might be assigned to additive effects when genes are at an extreme frequency [9]. The extent to which, for example, epistasis is involved in regulating complex traits is hardly known, but knowledge about it can be used to infer biological mechanisms and to reconstruct biological pathways [10]. In one of the first studies concerning nonadditive influence on growth differences in chickens, Carlborg et al. [11] estimated that 10% of genetic variation in early growth (trait Gr18) was due to dominance and even 70% due to epistasis. This example showed the importance of interacting loci, though one may suppose an overestimation of the epistatic effects, a phenomenon already known as the Beavis effect [12] for single loci. Since this experiment was based on a cross of extremely different lines, further investigations are required to find evidence for interacting genes in purebreds.
Different approaches are available to model additive and nonadditive genetic effects. Under the aspect of QTL detection, a genome scan can be carried out to uncover genetic effects using, for example, a variance component method [13,14]. If additive and nonadditive effects are to be modelled simultaneously over the whole genome, we have to be aware of "p bigger than n" problems, meaning there are more parameters than there are observations. To cope with the allinone situation, Xu presented a Bayesian approach [15], which parallels the idea of BayesA [4], and an empirical Bayes method [16] both enabling the genomewide estimation of additive and nonadditive marker effects. The Bayesian methods commonly used for the estimation of additive effects apply Markov chain Monte Carlo (MCMC) simulations which require a lot of computing time, but they convince in terms of accuracy in predicting genetic values. In particular, the BayesB approach [4] is superior to other methods, for instance ridge regression and partial least squares [1719]. The MCMC sampling methods may collapse under high marker density if further nonadditive effects are included. As an alternative, an approximate Bayesian approach is available which applies the analytically derived posterior density for a marker effect rather than samples thereof [20]. This approach (called fBayesB) was shown to be slightly less accurate, because in an iterative procedure only a single marker effect is studied at a time while the vector of phenotypes is corrected for all other previously estimated effects. The fBayesB strategy is much faster than the conventional Bayesian methods using MCMC. This solving approach offers the possibility to additionally account for genomewide interacting effects and to estimate them with reasonable computational effort.
The objective of this study is to explore the impact of nonadditive effects on the prediction of genetic values in a livestock population. An improved estimation of additive effects and a better prediction of genetic values is intended, when additive and nonadditive effects are jointly involved in fitting a model to a trait. Since methods that aim to estimate nonadditive effects in arbitrary populations are just emerging, it is especially important to validate such approaches with simulations. Therefore, with this study, we pursue methodological aspects, thereby assembling facts that help to interpret results obtained with practical data in future work. We consider additive, dominance and pairwise epistatic effects captured by biallelic markers spread over the whole genome. The details of statistical modelling are presented in the first part of the paper. We extend the fast Bayesian method (fBayesB), which was developed under pure additivity [20], to include nonadditive effects. fBayesB is used to estimate the genetic effects on the basis of simulated datasets which resemble a dairy cattle population. Different scenarios are simulated to study the loss of accuracy of prediction if epistatic effects are not simulated but modelled and vice versa. In the second part, we summarise the results of analysing the simulated data. The amount of genetic variation assigned to each kind of genetic effect after genomewide estimation of marker effects is determined. To briefly show how the approach behaves in practice, we also apply fBayesB to a real data example. In the third part, we outline some constraints of estimating nonadditive effects via the fBayesB approach and discuss other solving strategies.
2 Methods
2.1 Statistical model
For the statistical analysis of genetic effects in a Bayesian framework, a hierarchical model is constructed similar to that of Meuwissen et al. [20]. Bold symbols are used for vectors and matrices. At first, only main genetic effects (i.e. additive and dominance effects) are included. In total m loci are studied on the genome. The vector of phenotypes y = (y_{1}, ..., y_{n})' is modelled as
This model is set up in the way of an F_{∞ }model [21]. Let μ be a population mean and 1 a vector of ones. The X and D are design matrices for allele substitution effects a = (a_{1}, ..., a_{m})' and dominance effects d = (d_{1}, ..., d_{m})', respectively. The entries of the design matrices are random variables which are realised depending on the observed marker genotypes (denoted as 11, 12, 22). For a homozygous genotype at locus j ∈ {1, ..., m} of animal i ∈ {1, ..., n}, X_{i,j }= ± 1 and D_{i,j }= 0; the positive effect is assigned to the more frequent allele. For a heterozygous genotype, X_{i,j }= 0 and D_{i,j }= 1.
This work relies on two assumptions. Firstly, linkage equilibrium (LE) between the different markers is assumed. Then genotypic effects at different loci are independently distributed and the estimation strategy does not depend on the order of markers. Secondly, in order to avoid the estimation of covariance components at intralocus investigations, the additive genetic value and the dominance genetic value are assumed uncorrelated at each locus, i.e. Cov(X_{i,j}a_{j}, D_{i,j}d_{j}) = 0 ∀i,j. This assumption can be fulfilled by reparametrising coefficients coding for the marker genotypes in advance. We apply the method of ÁlvarezCastro & Carlborg [22] to obtain an orthogonal decomposition of genetic values. This method involves the genotype frequencies p_{11,j}, p_{12,j}, p_{22,j }at each locus j and does not necessarily depend on HardyWeinberg equilibrium (HWE). The method is related to Cockerham's model [23] given HWE. In an F_{∞ }model, the genotypic effects G_{j }= (G_{11,j}, G_{12,j}, G_{22,j})' can be written as
The second and third column of S represent the possible realisations in X and D, respectively. The genotypic values can also be obtained in terms of an additive effect g_{a, j }and dominance effect g_{d, j }on the orthogonal scale by
where v = p_{11,j }+ p_{22,j } (p_{11,j } p_{22,j})^{2}. Since the representations (1) and (2) are equivalent [22], the F_{∞ }model can be translated into
where the design matrices X_{a }and X_{d }contain the corresponding entries of S_{A,j }(j = 1, ..., m) and relate to the additive and dominance effects on the orthogonal scale, respectively.
To obtain numerical stability in later calculations, coefficients of the main genetic effects are additionally standardised. Let p_{j }denote the (estimated) allele frequency at locus j. One possibility is to divide the columns in X_{a }and X_{d }by the standard deviation of the random variable coding the marker genotype for the additive or dominance effects, respectively,
Now the hierarchical structure of M1 can be characterised by the following prior distributions
L*(γ_{s}, λ_{s}) denotes a mixture of a Laplace distribution with zero expectation and the point mass at zero. The mixing probability is γ_{s}, then Pr(g_{s,j }= 0) = 1  γ_{s}. Furthermore, , where λ_{s }denotes a measure of uncertainty about the effects of the genetic variation source s. The hyperparameters γ_{s }and λ_{s }are specified for each source, either additive (s = a) or dominance (s = d).
In a second step, the pairwise epistatic effects are modelled. The genetic effect caused by an interaction between locus j and k is denoted as g_{s,j,k }with s ∈ {aa, ad, da, dd}. The effect is considered additive × additive (aa), only if the individual i is homozygous at the loci j and k. It is considered additive × dominance (ad), when is appears at a homozygous locus j and a heterozygous locus k (j < k) and dominance × additive (da) for the reverse case. The dominance × dominance effect (dd) appears between heterozygous loci. Using the already orthogonalised columns in X_{a }and X_{d}, M1 can be extended to include epistatic effects in a way similar to Kao & Zeng [21]. Let s ∈ {aa, ad, da, dd},
As an example, X_{aa,j,k }= X_{a,j }· X_{a,k}, where the symbol · denotes the elementwise multiplication of column j and k of X_{a}. Furthermore, X_{ad,j,k }= X_{a,j }· X_{d,k }is calculated to obtain the coefficients for the effect g_{ad,j,k}. This way, in total four times epistatic effects are modelled. The prior remains the same as for M1, but we assume that the probability of having nonzero epistatic effects is smaller than the γ_{s }for main effects.
2.2 Parameter estimation
The essence of the fBayesB approach is the iterative conditional expectation (ICE) algorithm, which is described in detail by Meuwissen et al. [20]. We only describe the steps which were adapted under the influence of nonadditive genetics. Initially, to get rid of the population mean μ*, we shift the observed phenotypic values by the estimated mean value, thus . The vector of genetic effects g_{s }has the length m_{s}, where m_{s }= m for s ∈ {a, d} and for s ∈ {aa, ad, da, dd}. In case of epistasis, the elements are stored in a vector according to a vectorised upper triangular matrix, where only elements above the diagonal are taken, i.e.
We carry out k = 1, 2, ..., k_{max }iterations and process the genetic effects in the order s = a, d based on M1 or s = a, d, aa, ad, da, dd based on M2. The genetic effect with index j = 1, ..., m_{s }is estimated as the posterior expectation
where denotes the vector of observed phenotypes corrected for all estimated effects except the jth effect in iteration round k.
Set and . For convenience we denote
Now the conditional expectation was determined analytically in Meuwissen et al. [20] as
With
The Θ_{U }(0; μ, σ^{2}) and Θ_{L}(0; μ, σ^{2}) are the expected value of an upper and lower truncated normal distribution N(μ, σ^{2}), respectively, with truncation point zero. The Φ(x; μ, σ^{2}) denotes the normal distribution function evaluated at some point x and ϕ(x; μ, σ^{2}) is the normal density function.
We introduce a slight modification to fBayesB as we update the estimated residual variance components in each iteration k by the residual sum of squares
Then is substituted by in the calculation of the conditional expectation. The steps above are carried out for all indices j within each source s of genetic variation. We continue until the vector of estimates fulfils the convergence criterion
otherwise the iterations stop at k = k_{max}. The (direct) genetic value DGV_{i }of individual i is obtained as the genomewide sum over all genotypic values and over the different sources, i.e.
Eventually, as a consequence of the standardisation, the genetic variance components are estimated as for each genetic variation source s. Note that this formula yields an approximation under LD because the covariance components Cov of potentially linked loci j and j' are absent. Under the given reparametrisation, the covariance Cov(X_{s,i,j}, X_{s,i',j'}) between genotype coefficients is not necessarily positive and the signs of the corresponding effects are not known. Therefore, it cannot be stated whether over or underestimation of genetic variance components is expected. We briefly examine the impact of missing linkage information in our simulations.
The suitability of the statistical models M1 and M2 are compared among the different simulated scenarios in terms of accuracy, which is the empirical correlation between predicted and simulated DGV in a validation set. We implemented this fBayesB approach in Fortran F90.
When studying only the main genetic effects via M1, the results of fBayesB are compared with BayesB [4]. A Fortran implementation of BayesB of Berry & Stranden is available on http://www.genomicselection.net webcite (obtained Sep 4, 2009). This version was extended to include dominance effects using a concatenated matrix (X_{a }X_{d}). In principle, it would be possible to additionally consider epistatic effects in BayesB, but this tool would probably require a few months to finish an adequate number of MCMC sampling rounds for a single simulated dataset.
2.3 Simulation study
Data generation
The simulated population is built up in such a way that it reflects a realistic dairy cattle population. We applied a mutationdrift model and simulated a population with effective population size of 100 animals and 52 273 single nucleotide polymorphisms (SNPs) on a 30 Morgan genome (in style of the Illumina Chip BovineSNP50 and based on Btau4.0 [24]). Details of the genome setup can be found in Melzer et al. (Melzer, Wittenburg, Repsilber: Simulating a more realistic genotypephenotype map for development of methods to predict phenotypes based on genomewide marker data  the livestock perspective, submitted). Starting with homozygous loci, a mutation rate of 2.5 · 10^{3 }per generation was chosen for each SNP locus and 400 generations of random mating involving recombination events on the genome were carried out. About 10% of the loci were fixed due to drift. The LD was measured as r^{2 }[25] and the average LD of adjacent SNPs was observed as r^{2 }= 0.12. The average SNP heterozygosity was 0.33. The training generations 401 and 402 each consisted of 50 halfsib families with 20 offspring. These individuals were genotyped and phenotyped (n = 2 000). The test generations 403 and 404 were built up the same way but without phenotyping the animals. Two main scenarios were set up which differed in the number of QTL. Either 23 or 230 SNP loci were randomly chosen from loci with minor allele frequency (MAF) > 0.02 in generation 400 to be the QTL. Main genetic effects (i.e. additive and dominance effects) were assigned to all QTL. Motivated by the findings of Hayes and Goddard [26], allele substitution effects were drawn from a gamma distribution with shape parameter α = 0.42 and scale parameter β = 2.619 (23QTL scenario) or β = 8.282 (230QTL scenario) similar to Meuwissen et al. [4]. The sign of an allele substitution effect was drawn at random with equal chance. The degree of dominance was drawn from a normal distribution with mean m = 0.193 and variance τ^{2 }= 0.312^{2 }[27]. The dominance effect was determined as the product of the absolute allele substitution effect and the degree of dominance. Epistatic effects were included optionally. This means, the genotypic information was used twice: either genotypic values were calculated with main effects only (simulation without epistasis) or genotypic values included main and epistatic effect (simulation with epistasis). For each source of epistasis, six (57) pairs of SNPs were randomly chosen out of the 23 (230) loci to cause interactions. Epistatic effects were drawn from normal distributions with arbitrary parameters chosen such that epistasis explained approximately 25% of the total genetic variance. Different parameters were used for each source of epistatic variation; the parameters are listed in Table 1. To obtain residual error terms, which should be comparable between simulations with and without epistasis, the residual variance component was determined depending on the chosen broadsense heritability of H^{2 }∈ {0.5, 0.3, 0.1}. As an example, H^{2 }= 0.5 results in a narrowsense heritability of h^{2 }= 0.474 (h^{2 }= 0.307) without (with) simulated epistasis in the 23QTL scenario. The 23QTL scenario was repeated 100 times for every H^{2 }and the 230QTL scenario was repeatedly simulated only for H^{2 }= 0.5.
Table 1. Mean (m) and variance (τ^{2}) of normal distributions to simulate epistatic genetic effects
Scale of genetic effects
For convenience, the phenotypes were simulated on the basis of an F_{∞ }model, but the genetic effects were estimated on the orthogonal scale. We employed the equivalence between the representations of genotypic values in (1) and (2) to obtain the translation between scales [22]. With no epistasis simulated, the allele substitution effect a_{j }and dominance effect d_{j }were translated into the effects g_{a,j }and g_{d,j }on the orthogonal scale by
If epistasis was simulated, the genetic effects on the orthogonal scale were determined for all locus combinations j and k and the main genetic effects were achieved as the marginal effects. On the F_{∞ }scale, we denote the vector of effects α_{j,k }= (μ, a_{j}, d_{j}, a_{k}, aa_{j,k}, da_{j,k}, d_{k}, ad_{j,k}, dd_{j,k})' and on the orthogonal scale. The translation for a single locus combination was
which directly led to the epistatic effects on the orthogonal scale. Due to the standardisation step in (3), the derived epistatic effect had to be multiplied by the corresponding scaling term. As an example for . The derivation of main genetic effects was more difficult. In order to avoid double counting, we considered the main effects separately and collected the contribution of interactions over the genome while the main effects were set to zero (this vector is denoted as α_{j=0,k=0}). The components of interest were obtained from
Note that the order of loci (either j < k or k < j) is necessary to assign the contribution of epistasis correctly to the different sources of genetic variation. Again, each main genetic effect was multiplied by the relevant standard deviation term.
Hyperparameters and other settings
The parameter λ_{s}, which reflects the prior uncertainty about a genetic effect, was determined indirectly through the choice of the total prior variance. For s ∈ {a, d, aa, ad, da, dd}, we assume that
In this study, we involved prior knowledge about the proportion of nonzero effects of the genetic variation source s in the simulated dataset and chose γ_{s }accordingly. In the 23QTL scenario we set γ_{a }= γ_{d }= 0.005 and γ_{s }= 10^{6 }for s ∈ {aa, ad, da, dd}. In the 230QTL scenario we applied γ_{a }= γ_{d }= 0.05 and γ_{s }= 10^{6 }for the epistatic effects. We will return to the issue of parameter choice in the Section Discussion.
Furthermore, to limit the number of iterations, we chose k_{max }= 1 000 and for the convergence criterion we used L = 10^{8 }for M1. Owing to the computational effort we set k_{max }= 200 and L = 10^{6 }for M2. Results are reported only for those repetitions where convergence was achieved.
In BayesB the main genetic effects were estimated simultaneously over the whole genome. A hyperparameter π was required to give the proportion of nonzero genetic effects in total; we set π = 0.005 (π = 0.05) in the 23QTL (230QTL) scenario. Furthermore, we carried out 50 000 MCMC iterations (40% were neglected as burnin) and within each iteration 1 000 rounds of the MetropolisHastings algorithm were employed.
Outline of data analysis
To begin with, we used every 10th marker (m = 5 227), which included the true positions of the simulated QTL, in the statistical analysis. With this reduced genotype dataset, we evaluated differences in parameter estimation between fBayesB and BayesB based on the model with additive and dominance effects. A main issue was to study the impact of including or not including pairwise epistatic effects on the accuracy of genetic value prediction with fBayesB. The influence of a varying proportion of genetic variation on the accuracy of prediction was obtained by analysing the data produced with different broadsense heritabilities. Further, we studied the consequences of spreading the genetic variation over a multitude of loci with almost equal amounts of variation in each source of genetic variation. In a next step, we used all SNP information (m = 52 273) without preselection of loci for the estimation of genetic effects and explored the applicability of fBayesB for a large genotype dataset. Finally, to study practical suitability, we estimated genetic effects in a sample of a heterogeneous stock of mice. Genotype and phenotype data are publicly available at http://gscan.well.ox.ac.uk/ webcite[28].
3 Results
On average 567 loci per dataset had MAF ≤ 0.01. These loci were omitted, but loci deviating from HWE (on average one locus per dataset) were not excluded from the analysis. The average LD between adjacent SNPs was r^{2 }= 0.07 in the reduced genotype dataset with 5 227 SNPs.
The differences between fBayesB and BayesB on the basis of M1 are compared. Table 2 shows the average estimated variance components and the average correlation between predicted and simulated genetic values in the 23QTL scenario. The accuracy between the methods differed only slightly, ρ = 0.98 when no epistasis was simulated and ρ = 0.78 with simulated epistasis. Both in simulations with and without epistasis, the estimated variance components were similarly biased with BayesB and fBayesB, i.e., the relative bias of the estimate was 2% (7%) and the relative bias of was 13% (26 to 27%) without (with) simulated epistasis. Though fBayesB only required a small fraction of computing time compared to BayesB (one second versus about six hours on a 2.93 GHz multiuser system), there was neither a lack of accuracy nor differences in bias of variance component estimation.
Table 2. Average estimated variance components (standard deviation in brackets) and average accuracy ρ of genetic value prediction*
The additive and dominance effects were estimated equally well with both BayesB and fBayesB. As an example, Figure 1 displays results for the analysis of a single dataset via fBayesB. It shows that the size and location of large to intermediate additive effects were estimated precisely and the pivotal dominance effects were identified closely. In general, there were nearly no differences in the size of estimates of rather large effects and their position between BayesB and fBayesB. It was observed that via BayesB a lot of tiny (but with an effect size > 10^{4}) genetic effects were estimated over the whole genome, whereas fBayesB concentrated on the large loci.
Figure 1. Estimates of genetic effects if epistasis was absent in the 23QTL scenario. (A) Additive and (B) dominance effects for a single dataset via M1 using fBayesB. Filled circles were plotted for each estimated effect > 10^{4}. Single accuracy of genetic value prediction was 0.946.
M1 and M2 results are compared to study the impact of including or not including pairwise epistatic effects on the accuracy of predicting the genetic values in the test generations. As an example, Additional file 1 shows the estimated main genetic and epistatic effects for a single dataset when main and epistatic effects were simulated and modelled jointly. The size and position of large main or large epistatic effects were estimated quite well (visual inspection). Small effects, especially concerning dominance, were neither estimated with correct size nor at the simulated position. When epistasis was simulated in the 23QTL scenario, we obtained an accuracy of 0.781 with M1, which was 5% less than the accuracy based on the correct model M2 for this application, see Table 2. Furthermore, the genetic variance components were underestimated to a larger extent with M1 than with M2. The relative bias of and was 7% and 26%, respectively, based on M1 and 5% and 11%, respectively, based on the correct model M2. In the reverse case, when epistasis was modelled and not simulated, the accuracy was 0.959. Hence the loss of accuracy of genetic value prediction was only 2% when the incorrect model M2 was applied. The relative bias of and was 1% and 3%, respectively, based on the incorrect model M2 compared to 2% and 13%, respectively, based on the correct model M1. Thus, even with additionally modelled (nuisance) genetic effects in M2, the bias of variance component estimates did not increase for additive and dominance effects. In conclusion, and as expected, we obtained the best estimates of genetic variance components and the highest possible accuracy in the validation set, when M1 was applied in simulations without epistasis and M2 was used under simulated epistasis, i.e., prediction was done with the true model. The loss of accuracy was, however, low when the incorrect model was applied. The relative proportion of genetic variation that could be assigned to the variation of additive effects was estimated best if the correct model was applied. As an example, in the 23QTL scenario with simulated epistasis, the true ratio of additive to total genetic variance was 0.613. The estimated ratio was 0.626 based on M2 but 0.884 based on M1, see Table 3.
Additional file 1. The figure shows estimates of genetic effects and location if epistasis was present in the 23QTL scenario: (a) additive, (b) dominance, (c) additive × additive and (d) additive × dominance effects for a single dataset with M2 using fBayesB. Filled circles were plotted for each estimated effect > 10^{4}. Location of (e) additive × additive and (f) additive × dominance epistatic effects. Single accuracy of genetic value prediction was 0.851.
Format: PDF Size: 452KB Download file
This file can be viewed with: Adobe Acrobat Reader
Table 3. Average ratio of additive genetic variance to total genetic variance*
The results obtained so far are based on H^{2 }= 0.5. The influence of a varying proportion of genetic variation in terms of the broadsense heritability H^{2 }on the accuracy of prediction was investigated. Table 4 displays the decreasing accuracy with decreasing H^{2}. Simulations with H^{2 }= 0.3 and H^{2 }= 0.5 yielded similar accuracies with M1; accuracy differed about 3  5%. With M2 the differences in accuracy were 6  12%. With H^{2 }= 0.1 the differences decreased further about 11  38%. If the proportion of the genetic variation was 0.1, fBayesB had numerical problems with M2 under the given choice of hyperparameters; the algorithm converged to a final solution only in 40% of the repetitions (90% for H^{2 }= 0.3, 99.5% for H^{2 }= 0.5). In repetitions that did not converge (H^{2 }= 0.1: 3.5%, H^{2 }= 0.3: 0.5%, H^{2 }= 0.5: 0%) a fluctuating convergence criterion was observed. In all other cases, the algorithm collapsed for no obvious reason.
Table 4. Average accuracy of genetic value prediction depending on broadsense heritability H^{2}*
In order to prove that we benefit from additionally modelling nonadditive genetic effects if those were simulated, we compared the accuracy of genetic value prediction based on M1 with accuracy obtained from a conventional model including only additive genetic effects, called M0. Except for constellations with H^{2 }= 0.1, accuracy of M1 was 12% (34%) higher in simulations without (with) epistasis than accuracy of M0, see Table 4. If we looked at the 10% animals with best predicted additive genetic value (i.e. the breeding value) in simulations with epistasis and H^{2 }= 0.5, the accuracy of additive genetic value prediction was 0.618 with M0, 0.621 with M1 and 0.598 with the correct model M2. If we look at the 10% best animals when epistasis was not simulated, the accuracy of additive genetic value prediction was 0.786 based on M0, 0.774 with the correct model M1 and 0.748 with M2. Thus, model choice had an impact on predicting the total genetic values, but if only the extreme breeding values were of interest, e.g. for selection purposes, prediction with a conventional model (M0) was more precise than with the corresponding true model.
In a further step, we studied the consequence when the genetic variation was spread over a multitude of loci and compare results obtained with BayesB and fBayesB. Furthermore, the 230QTL scenario is confronted with the outcomes of fBayesB in the 23QTL case. When epistasis was not simulated in the 230QTL scenario, highest accuracy of genetic value prediction was obtained with M1, see Table 5. Though BayesB had a higher relative bias of (11% vs. 1%) but crucially smaller bias of (30% vs. 284%) compared to fBayesB, accuracy was 10% higher. Apparently, dominance has an important impact on genetic value prediction and BayesB could better cope with a larger amount of QTL. fBayesB was able to identify large to intermediate effects, see e.g. Figure 2, but small effects could not be precisely uncovered. BayesB was also superior to fBayesB in terms of accuracy and bias of variance component estimation based on M1 in simulations with epistasis, see Table 5. In any case, fBayesB extremely overestimated variance components of nonadditive effects. With application of M1 and fBayesB, the proportion of additive genetic variation to the total genetic variance was underestimated (overestimated) about 13% without (with) simulated epistasis (Table 3). On the basis of M2 this proportion was underestimated by about 25 to 36%.
Table 5. Average estimated variance components (standard deviation in brackets) and average accuracy ρ of genetic value prediction*
Figure 2. Estimates of genetic effects if epistasis was absent in the 230QTL scenario. (A) Additive and (B) dominance effects for a single dataset via M1 using fBayesB. Filled circles were plotted for each estimated effect > 10^{4}. Single accuracy of genetic value prediction was 0.814.
The more QTL were simulated, the less accuracy was observed. If a 10fold of QTL was responsible for genetic variation, the accuracy of prediction decreased about 2224% based on M1 and 3549% based on M2. Since the distances between QTL were smaller than in the 23QTL scenario, we could expect that LD between loci contributed to the bias of the estimated variance components. For that reason we calculated the empirical variances obtained from the predicted effectspecific genetic values in the validation set, where the epistatic contribution was collected in one component. Table 6 shows that, if only few QTL were given, the missing LD information could be ignored, no matter if epistasis was regarded or not. In contrast, the empirical variance components clearly deviated from those estimated under LE in the 230QTL scenario, especially if epistasis was modelled. Consequently, the reported variance components in Tables 2, 3 and 5 can only be interpreted as approximations.
Table 6. Comparison of empirical variances of predicted genetic values and genetic variance components estimated under LE*
Next we used the genomewide SNP information in the statistical analysis (m = 52 273). An average of 5 685 loci per dataset were omitted because MAF ≤ 0.01. An average of nine loci deviated from HWE, but these loci were retained. We set γ_{a }= γ_{d }= 0.005 for both QTL scenarios. If only main genetic effects were simulated and modelled in the 23QTL scenario with H^{2 }= 0.5, the additive genetic variance component was obtained as , whereas the dominance variance component was extremely overestimated as (se = 0.354). The accuracy ρ = 0.723 was still reasonably high. In the 230QTL scenario, the accuracy of prediction reduced to ρ = 0.513 and (se = 0.251), but (se = 0.308) was not estimated as well as with the reduced SNP set on the basis of M1. Including the pairwise epistatic effects via M2 exceeded practicability. On the basis of 5 227 SNP, more than 13 million effects had to be estimated for each source of epistatic variation and fBayesB required an average of six hours to converge. If 52 273 SNP markers are included, then approximately 1.3 billion effects have to be estimated for each of the four sources of epistasis. Though most markers or pairs of markers have no effect, their estimated genetic effects will be small but not exactly zero. It was not feasible to estimate about 5 billion effects via M2 under proper numerical precision owing to the restricted capacity of memory space. Furthermore, it is questionable how much computing time is required to execute several rounds of iteration. Thus, with 52 273 SNP markers, only M1 was applied to the simulated data with epistasis. This led to a reduced accuracy of ρ = 0.611 (ρ = 0.380) in the 23QTL scenario (230QTL scenario).
Finally, in the real data example, we regarded m = 9 441 SNPs which passed the standard quality checks on HWE and MAF. Rarely missing genotypes for these SNPs were imputed via Beagle 3.2 [29]. We studied an immunological phenotype, i.e. percentage of CD8^{+ }cells, and standardised the vector of observations (n = 1 521) to avoid numerical problems. A set of covariates was considered similar to Valdar et al. [30]: gender, age, family, litter, cage density, experimenter, month and year of experiment. Phenotypes were corrected for the leastsquares estimates of these factors in each iteration of the fBayesB algorithm [20]. We set γ_{a }= γ_{d }= 0.001 and γ_{s }= 10^{6 }for the epistatic effects. Narrowsense heritability was similarly estimated among the models (M0: h^{2 }= 0.294, M1: h^{2 }= 0.295, M2: h^{2 }= 0.317), which shows robustness of fBayesB in terms of additive genetic variation, see Table 7. Broadsense heritability increased with growing model complexity (M1: H^{2 }= 0.347, M2: H^{2 }= 0.448). Figures depicting estimated effect sizes are given in Additional file 2. The largest effects were observed in the MHC region on chromosome 17, which was also reported by Valdar et al. [28]. In total, 88% (65%) of the genetic variation was observed around the MHC with M1 (M2). Though additive genetic effect sizes were nearly the same with all models, an additional dominance effect appeared with M2 on chromosome 17. Furthermore, a large epistatic effect occured between chromosomes 1 and 8. Thus, adding epistatic effects to a statistial model may not necessarily improve genetic value predicition, as investigated by Lee et al. [7] (see Section Background), but it helps to specify sources of genetic variation and to identify loci that contribute to variation only through interactions.
Table 7. Estimated variance components for the real data example*
Additional file 2. The fBayesB approach was applied to public data on a heterogeneous stock of mice. Genetic effects were estimated based on the different models including only additive effects (M0), additive and dominance effects (M1), additive, dominance and pairwise epistatic effects (M2).
Format: PDF Size: 1.8MB Download file
This file can be viewed with: Adobe Acrobat Reader
4 Discussion
4.1 Hyperparameters and convergence
When we investigated the influence of a varying proportion of genetic to phenotypic variance on genetic value prediction in the 23QTL scenario, it was observed that fBayesB did not fulfil the convergence criterion in all situations. In the extreme case with M2 and H^{2 }= 0.1, only 40% of all repetitions converged to a proper final solution and it happened that fBayesB simply aborted. (Usually, the algorithm converged after 1316 iterations with M1 and after 2628 steps with M2, but up to a 5fold of iteration steps were necessary in the 230QTL scenario.) In order to avoid termination, one could tune the "free" hyperparameter λ_{s}, which is responsible for the variation of a genetic effect a priori. For convenience, we assumed that the total prior variance was equal to one for each source of genetic variation s ∈ {a, d, aa, ad, da, dd}, see Equation (4). This prior guess depends on the hyperparameter γ_{s }which was equal among s ∈ {a, d} and s ∈ {aa, ad, da, dd}. Thus, it seems necessary to specifically adjust λ_{s }and/or γ_{s }to each source of genetic variation.
4.2 Proportion of nonzero effects
A preliminary study could show that the choice of the hyperparameter γ_{s }strongly influenced the accuracy of genetic value prediction and the ability of the fBayesB algorithm to converge (Melzer, Wittenburg, Repsilber: Simulating a more realistic genotypephenotype map for development of methods to predict phenotypes based on genomewide marker data  the livestock perspective, submitted). Since the aim of this paper was to investigate the suitability of fBayesB to cope with nonadditive effects in general, we simply involved prior knowledge about the proportion of nonzero genetic effects when the hyperparameter γ_{s }had to be specified for each genetic variation source s ∈ {a, d, aa, ad, da, dd}. There are several possibilities which allow for a flexible setting of this hyperparameter. In an intuitive manner, one could determine γ_{s }via crossvalidation as it was done in Melzer et al. Another encouraging approach was presented by Shepherd et al. [31], therein called emBayesB. It is a BayesBlike estimation of SNP effects without the time consuming MetropolisHastings algorithm, but with an EM algorithm for the estimation of γ_{s}. It employs a binary variable indicating whether a marker is in LD with a QTL (i.e., it is a nonzero effect). So far, this approach was verified for additive genetic effects via simulations, but it is certainly applicable to the nonadditive case as well. Not only the specification of the proportion of nonzero effects in the prior setting, however, is important. This study additionally showed that the more loci were responsible for genetic variation, the worse the genetic parameters were estimated, even though we accounted for this proportion in γ_{s}. With higher proportions of small to intermediate genetic effects, the bias of estimation seriously accumulates via fBayesB. One way out is to reduce noise by eliminating zero effects. This objective is discussed in the next section.
4.3 Reduction of model dimensionality
SNP density continues to increase; soon wholegenome sequences will be used for statistical analysis [32]. The ability to uncover genetic effects with Bayesian MCMC methods worsens with increasing LD due to redundancy between markers [33]. Thus, in order to deal with the huge amounts of data, it becomes important to select relevant information. Selection is conceivable in two general ways.
In order to keep as many parameters as required in the statistical model, one could apply a filtering procedure. The significance of putative nonzero effects might be determined, for example, via a stochastic variable selection approach (SVS). In the field of genomic selection, which is based only on additive effects, an SVS implementation of Meuwissen and Goddard [34] was applied by Calus et al. [35] to simulated data as well as by Verbyla et al. [36] to dairy cattle data. In case of additional nonadditive effects, SVS was developed and already successfully applied to obesity data in a mouse backcross population [37]. In that work, an upper bound of model dimensionality had to be fixed and indicator variables were involved specifying which main and epistatic effect had to be included in the model. The Bayes factor then gave evidence of putative QTL.
Dimensionality can also be reduced nonparametrically. As an example, a subset of SNPs may be selected via filtering based on entropy information and wrapping using a naive Bayesian classifier [38]. Alternatively, an informative set of SNPs can be identified on the basis of LD between loci, called tagSNP [39]. This strategy would probably also reduce the bias in variance component estimation due to LD, because only one marker represents a certain chromosome segment. A haplotyping strategy based on LD information was applied to SNP data in Australian beef cattle [40] but with limited success. The authors reported that about 30 000 SNP markers (and a large number of phenotypic records) are required for accurate breeding value prediction. Thus, we have to work with some contradiction: more markers for higher accuracy but less markers (or only the best markers) to reduce estimation errors. The best solution is probably obtained, when the models used are better able to distinguish between markers with and without effect. Meuwissen [41] presented other options to reduce a set of SNPs based on LD between loci or relatedness between individuals.
4.4 Nonadditive effects
This study has shown that the inclusion of dominance effects in genetic value prediction improved accuracy compared to purely additive models (Table 4). We found that the incorporation of dominance effects was less challenging than the inclusion of epistasis, and we have made a robust step towards advancing insight into the genetic architecture. Regardless of whether dominance or epistatic effects are considered, adequate data are required to estimate nonadditive effects. This is also true for periodic reestimation of genetic effects. In contrast to genomic selection, where additive effects may be obtained from average yields of progeny of genotyped parents, genotyped individuals need to have an own phenotype (e.g. cows).
In general, and also confirmed in our investigations, parametric methods have difficulties to identify and to estimate epistatic effects. One reason is that the orthogonal decomposition of genetic effects only lead to proper results under idealised conditions (LE, absence of mutation and selection etc.) which are violated in practice [42]. As reviewed and discussed by Calus [5], nonparametric methods (e.g. [43]) have the potential to outperform parametric approaches if nonadditive effects are included. With an application to broiler data [44], it was shown that kernel methods had a better predictive ability than parametric methods when genomewide markers were used. For thousands of SNPs and millions of interactions, fBayesB is still computationally feasible but it shows an inherent bias of variance component estimation. Alternatively, machine learning techniques may discover hidden patterns of gene interaction without assuming their structure [45].
Once gene interactions are discovered, they may be used for mate allocation in livestock breeding, where individuals are mated to achieve favourable nonadditive gene combinations to further increase genetic gain [46]. Apart from breeding applications, improved statistical modelling [41] and our cognitive interest in the formation of complex phenotypes will benefit from knowledge about the distribution of nonadditive effects over the genome and their size.
4.5 Number of simulated QTL
An increase in the number of QTL was accompanied by a reduction in the quality of fBayesB for genetic value prediction. fBayesB was able to identify only the biggest QTL effects in the simulated scenarios, in which (nearly) the same amount of genetic variation was spread over 23 or 230 QTL. Thus, effect size in the 230QTL scenario was roughly onetenth of that in the 23QTL case. This complicated the identification of genetic effects in general and, in particular, of nonadditive effects, which contributed very little to the genetic variance when compared with additive effects. Many tiny effects were estimated with BayesB, even if genetic variation was caused by few QTL with large effects. In both QTL scenarios, accuracy of genetic value prediction was at a high level with BayesB. It may be more realistic to assume that most livestock traits are influenced by many loci and therefore best results can be expected with BayesB.
5 Conclusion
This simulation study showed that the fast Bayesian method (fBayesB) is convenient for genetic value prediction. It requires only a fraction of computing time compared to a conventional MCMC approach BayesB and also enables estimating pairwise interactions.
The number of simulated QTL, the proportion of genetic to phenotypic variance as well as the quantity of SNP in statistical analyses influenced accuracy of genetic value prediction and bias of variance component estimation. Both methods obtained similar results when few QTL with additive and dominance effects were simulated; the maximum accuracy was 98%. As expected, best results were obtained on the basis of the true model corresponding to the simulated scenario, but the loss of accuracy due to using the incorrect model was limited to 25%. If many QTL were responsible for genetic variation, accuracy decreased about 2249% with fBayesB compared to the few QTL scenario, depending on the model. Accuracy based on modelling only additive and dominance effects was generally superior to the complex model, no matter if epistasis was simulated or not, and an additional gain of 410% accuracy was observed with BayesB. To sum up, existing approaches for genomewide estimation of additive genetic effects can easily and robustly be extended by dominance effects to improve accuracy of genetic value prediction and to get further insight into the genetic architecture. In this simulation study, the inclusion of dominance was more important than involving all pairwise interactions, which did not improve prediction in general.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
DW implemented the statistical methods, carried out the analysis and wrote the manuscript. NM simulated the datasets and contributed to the data analysis. NR raised the initial question, advised on the research and suggested improvements to the manuscript. All authors have read and approved the final manuscript.
Acknowledgements
This study is part of the FUGATO project "Bovine Integrative Bioinformatics for Genomic Selection (BovIBI)" with financial support of the German Federal Ministry of Education and Research (BMBF).
References

Visscher PM, Macgregor S, Benyamin B, Zhu G, Gordon S, Medland S, Hill WG, Hottenga JJ, Willemsen G, Boomsma DI, Liu YZ, Deng HW, Montgomery GW, Martin NG: Genome partitioning of genetic variation for height from 11,214 sibling pairs.
American Journal of Human Genetics 2007, 81(5):11041110. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME: Invited review: Genomic selection in dairy cattle: progress and challenges.
Journal of Dairy Science 2009, 92(2):433443. PubMed Abstract  Publisher Full Text

Legarra A, RobertGranié C, Manfredi E, Elsen JM: Performance of genomic selection in mice.
Genetics 2008, 180:611618. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Meuwissen TH, Hayes BJ, Goddard ME: Prediction of total genetic value using genomewide dense marker maps.
Genetics 2001, 157(4):18191829. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Calus MPL: Genomic breeding value prediction: methods and procedures.
animal 2010, 4(02):157164. Publisher Full Text

Melchinger AE, Utz HF, Piepho HP, Zeng ZB, Schön CC: The Role of Epistasis in the Manifestation of Heterosis: A SystemsOriented Approach. [http://www.genetics.org/content/177/3/1815.abstract] webcite
Genetics 2007, 177(3):18151825. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Lee SH, van der Werf JHJ, Hayes BJ, Goddard ME, Visscher PM: Predicting unobserved phenotypes for complex traits from wholegenome SNP data.
PLoS Genetics 2008, 4(10):e1000231. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Hu Z, Li Y, Song X, Han Y, Cai X, Xu S, Li W: Genomic value prediction for quantitative traits under the epistatic model. [http://www.biomedcentral.com/14712156/12/15] webcite
BMC Genetics 2011, 12:15. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Hill WG, Goddard ME, Visscher PM: Data and theory point to mainly additive genetic variance for complex traits.
PLoS Genetics 2008, 4(2):e1000008. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Carlborg Ö, Haley CS: Epistasis: too often neglected in complex trait studies?
Nature Reviews Genetics 2004, 5(8):618625. PubMed Abstract  Publisher Full Text

Carlborg Ö, Kerje S, Schütz K, Jacobsson L, Jensen P, Andersson L: A global search reveals epistatic interaction between QTL for early growth in the chicken.
Genome Research 2003, 13(3):413421. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Beavis W: QTL analyses: power, precision, and accuracy. CRC Press; 1998.

Rönnegård L, Besnier F, Carlborg Ö: An Improved Method for Quantitative Trait Loci Detection and Identification of WithinLine Segregation in F_{2 }Intercross Designs.
Genetics 2008, 178:23152326. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Zimmer D, Mayer M, Reinsch N: Complex Genetic Effects in Quantitative Trait Locus Identification: A Computationally Tractable Random Model for Use in F2 Populations.
Genetics 2011, 187:261. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Xu S: Estimating polygenic effects using markers of the entire genome.
Genetics 2003, 163(2):789801. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Xu S: An empirical Bayes method for estimating epistatic effects of quantitative trait loci.
Biometrics 2007, 63(2):513521. PubMed Abstract  Publisher Full Text

Fernando RL, Habier D, Stricker C, Dekkers JCM, Totir LR: Genomic Selection.
Acta Agriculturae Scandinavica, Section A  Animal Sciences 2007, 57:192195. Publisher Full Text

Habier D, Fernando RL, Dekkers JCM: The impact of genetic relationship information on genomeassisted breeding values.
Genetics 2007, 177(4):23892397. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Solberg TR, Sonesson AK, Woolliams JA, Meuwissen TH: Reducing dimensionality for prediction of genomewide breeding values.
Genetics Selection Evolution 2009, 41:29. BioMed Central Full Text

Meuwissen THE, Solberg TR, Shepherd R, Woolliams JA: A fast algorithm for BayesB type of prediction of genomewide estimates of genetic value. [http://www.gsejournal.org/content/41/1/2] webcite
Genetics Selection Evolution 2009, 41:2. BioMed Central Full Text

Kao CH, Zeng ZB: Modeling epistasis of quantitative trait loci using Cockerham's model.
Genetics 2002, 160(3):12431261. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

ÁlvarezCastro JM, Carlborg Ö: A unified model for functional and statistical epistasis and its application in quantitative trait Loci analysis.
Genetics 2007, 176(2):11511167. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Cockerham CC: An Extension of the Concept of Partitioning Hereditary Variance for Analysis of Covariances among Relatives When Epistasis Is Present.
Genetics 1954, 39(6):859882. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

The Bovine Genome Sequencing and Analysis Consortium: The genome sequence of taurine cattle: a window to ruminant biology and evolution.
Science 2009, 324(5926):522528. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Hill W, Robertson A: Linkage disequilibrium in finite populations.
TAG Theoretical and Applied Genetics 1968, 38(6):226231. Publisher Full Text

Hayes B, Goddard ME: The distribution of the effects of genes affecting quantitative traits in livestock.
Genetics Selection Evolution 2001, 33(3):209229. BioMed Central Full Text

Bennewitz J, Meuwissen THE: The distribution of QTL additive and dominance effects in porcine F2 crosses.
Journal of Animal Breeding and Genetics 2010, 127(3):171179. PubMed Abstract  Publisher Full Text

Valdar W, Solberg LC, Gauguier D, Burnett S, Klenerman P, Cookson WO, Taylor MS, Rawlins JNP, Mott R, Flint J: Genomewide genetic association of complex traits in heterogeneous stock mice.
Nature Genetics 2006, 38(8):879887. PubMed Abstract  Publisher Full Text

Browning SR, Browning BL: Rapid and accurate haplotype phasing and missingdata inference for wholegenome association studies by use of localized haplotype clustering.
American Journal of Human Genetics 2007, 81(5):10841097. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Valdar W, Solberg LC, Gauguier D, Cookson WO, Rawlins JNP, Mott R, Flint J: Genetic and environmental effects on complex traits in mice.
Genetics 2006, 174(2):959984. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Shepherd RK, Meuwissen TH, Woolliams JA: Genomic selection and complex trait prediction using a fast EM algorithm applied to genomewide markers.
BMC Bioinformatics 2010, 11:529. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Meuwissen T, Goddard M: Accurate prediction of genetic values for complex traits by wholegenome resequencing.
Genetics 2010, 185(2):623631. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Gianola D, de los Campos G, Hill WG, Manfredi E, Fernando R: Additive genetic variability and the Bayesian alphabet.
Genetics 2009, 183:347363. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Meuwissen THE, Goddard ME: Mapping multiple QTL using linkage disequilibrium and linkage analysis information and multitrait data.
Genetics Selection Evolution 2004, 36(3):261279. BioMed Central Full Text

Calus MPL, Meuwissen THE, de Roos APW, Veerkamp RF: Accuracy of genomic selection using different methods to define haplotypes.
Genetics 2008, 178:553561. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Verbyla KL, Hayes BJ, Bowman PJ, Goddard ME: Accuracy of genomic selection using stochastic search variable selection in Australian Holstein Friesian dairy cattle.
Genet Res 2009, 91(5):307311. Publisher Full Text

Yi N, Yandell BS, Churchill GA, Allison DB, Eisen EJ, Pomp D: Bayesian model selection for genomewide epistatic quantitative trait loci analysis.
Genetics 2005, 170(3):13331344. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Long N, Gianola D, Rosa GJM, Weigel KA, Avendaño S: Machine learning classification procedure for selecting SNPs in genomic selection: application to early mortality in broilers.
Journal of Animal Breeding and Genetics 2007, 124(6):377389. PubMed Abstract  Publisher Full Text

Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA: Selecting a maximally informative set of singlenucleotide polymorphisms for association analyses using linkage disequilibrium.
American Journal of Human Genetics 2004, 74:106120. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Hayes BJ, Chamberlain AJ, McPartlan H, Macleod I, Sethuraman L, Goddard ME: Accuracy of markerassisted selection with single markers and marker haplotypes in cattle.
Genetical Research 2007, 89(4):215220. PubMed Abstract  Publisher Full Text

Meuwissen T: Use of whole genome sequence data for QTL mapping and genomic selection. In Proceedings of the 9th World Congress on Genetics Applied to Livestock Production. Leipzig, Germany: Gesellschaft für Tierzuchtwissenschaften e.V; 2010.
[Abstract ID 0018, ISBN 9783000316081]

Gianola D, de los Campos G: Inferring genetic values for quantitative traits nonparametrically.
Genetics Research 2008, 90(6):525540. PubMed Abstract  Publisher Full Text

Gianola D, Fernando RL, Stella A: Genomicassisted prediction of genetic value with semiparametric procedures.
Genetics 2006, 173(3):17611776. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

GonzálezRecio O, Gianola D, Long N, Weigel KA, Rosa GJM, Avendaño S: Nonparametric Methods for Incorporating Genomic Information Into Genetic Evaluations: An Application to Mortality in Broilers.
Genetics 2008, 178:23052313. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Gianola D, de los Campos G, GonzálezRecio O, Long N, Okut H, Rosa GJM, Weigel KA, Wu XL: Statistical Learning Methods For Genomebased Analysis Of Quantitative Traits. In Proceedings of the 9th World Congress on Genetics Applied to Livestock Production. Leipzig, Germany: Gesellschaft für Tierzuchtwissenschaften e.V; 2010.
[Abstract ID 0014, ISBN 9783000316081]

Goddard ME, Hayes BJ: Genomic selection.
Journal of Animal Breeding and Genetics 2007, 124:323330. PubMed Abstract  Publisher Full Text