Predicting complex quantitative traits with Bayesian neural networks: a case study with Jersey cows and wheat
1 Dept. of Animal Sciences, University of Wisconsin, Madison, 53706, USA
2 Dept. of Dairy Science, University of Wisconsin, Madison, 53706, USA
3 Dept. of Biostatistics and Medical Informatics, University of Wisconsin, Madison, 53706, USA
4 Dept. of Animal Sciences, Biometry and Genetics Branch, University of Yuzuncu Yil, Van, 65080, Turkey
BMC Genetics 2011, 12:87 doi:10.1186/1471-2156-12-87Published: 7 October 2011
In the study of associations between genomic data and complex phenotypes there may be relationships that are not amenable to parametric statistical modeling. Such associations have been investigated mainly using single-marker and Bayesian linear regression models that differ in their distributions, but that assume additive inheritance while ignoring interactions and non-linearity. When interactions have been included in the model, their effects have entered linearly. There is a growing interest in non-parametric methods for predicting quantitative traits based on reproducing kernel Hilbert spaces regressions on markers and radial basis functions. Artificial neural networks (ANN) provide an alternative, because these act as universal approximators of complex functions and can capture non-linear relationships between predictors and responses, with the interplay among variables learned adaptively. ANNs are interesting candidates for analysis of traits affected by cryptic forms of gene action.
We investigated various Bayesian ANN architectures using for predicting phenotypes in two data sets consisting of milk production in Jersey cows and yield of inbred lines of wheat. For the Jerseys, predictor variables were derived from pedigree and molecular marker (35,798 single nucleotide polymorphisms, SNPS) information on 297 individually cows. The wheat data represented 599 lines, each genotyped with 1,279 markers. The ability of predicting fat, milk and protein yield was low when using pedigrees, but it was better when SNPs were employed, irrespective of the ANN trained. Predictive ability was even better in wheat because the trait was a mean, as opposed to an individual phenotype in cows. Non-linear neural networks outperformed a linear model in predictive ability in both data sets, but more clearly in wheat.
Results suggest that neural networks may be useful for predicting complex traits using high-dimensional genomic information, a situation where the number of unknowns exceeds sample size. ANNs can capture nonlinearities, adaptively. This may be useful when prediction of phenotypes is crucial.