Genome position specific priors for genomic prediction
1 Centre for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Faculty of Science and Technology, Aarhus University, Tjele, 8830, Denmark
2 Biosciences Research Division, Department of Primary Industries Victoria, Bundoora, 3083, Australia
3 Land and Food Resources, University of Melbourne, Parkville, 3072, Australia
4 Dairy Futures Cooperative Research Centre, Bundoora, Victoria, 3083, Australia
5 La Trobe University, Bundoora, Victoria, 3086, Australia
BMC Genomics 2012, 13:543 doi:10.1186/1471-2164-13-543Published: 10 October 2012
The accuracy of genomic prediction is highly dependent on the size of the reference population. For small populations, including information from other populations could improve this accuracy. The usual strategy is to pool data from different populations; however, this has not proven as successful as hoped for with distantly related breeds. BayesRS is a novel approach to share information across populations for genomic predictions. The approach allows information to be captured even where the phase of SNP alleles and casuative mutation alleles are reversed across populations, or the actual casuative mutation is different between the populations but affects the same gene. Proportions of a four-distribution mixture for SNP effects in segments of fixed size along the genome are derived from one population and set as location specific prior proportions of distributions of SNP effects for the target population. The model was tested using dairy cattle populations of different breeds: 540 Australian Jersey bulls, 2297 Australian Holstein bulls and 5214 Nordic Holstein bulls. The traits studied were protein-, fat- and milk yield. Genotypic data was Illumina 777K SNPs, real or imputed.
Results showed an increase in accuracy of up to 3.5% for the Jersey population when using BayesRS with a prior derived from Australian Holstein compared to a model without location specific priors. The increase in accuracy was however lower than was achieved when reference populations were combined to estimate SNP effects, except in the case of fat yield. The small size of the Jersey validation set meant that these improvements in accuracy were not significant using a Hotelling-Williams t-test at the 5% level. An increase in accuracy of 1-2% for all traits was observed in the Australian Holstein population when using a prior derived from the Nordic Holstein population compared to using no prior information. These improvements were significant (P<0.05) using the Hotelling Williams t-test for protein- and fat yield.
For some traits the method might be advantageous compared to pooling of reference data for distantly related populations, but further investigation is needed to confirm the results. For closely related populations the method does not perform better than pooling reference data. However, it does give an increased accuracy compared to analysis based on only one reference population, without an increased computational burden. The approach described here provides a general setup for inclusion of location specific priors: the approach could be used to include biological information in genomic predictions.