Strategies and utility of imputed SNP genotypes for genomic analysis in dairy cattle
1 Reprogen - Animal Bioscience, Faculty of Veterinary Science, University of Sydney, 425 Werombi Road, Camden, NSW, 2570, Australia
2 Department of Primary Industries, Biosciences Research Division, Bundoora, Victoria, 3083, Australia
3 Dairy Futures Cooperative Research Centre (CRC), Bundoora, Victoria, Australia
BMC Genomics 2012, 13:538 doi:10.1186/1471-2164-13-538Published: 8 October 2012
We investigated strategies and factors affecting accuracy of imputing genotypes from lower-density SNP panels (Illumina 3K, 7K, Affymetrix 15K and 25K, and evenly spaced subsets) up to one medium (Illumina 50K) and one high-density (Illumina 800K) SNP panel. We also evaluated the utility of imputed genotypes on the accuracy of genomic selection using Australian Holstein-Friesian cattle data from 2727 and 845 animals genotyped with 50K and 800K SNP chip, respectively. Animals were divided into reference and test sets (genotyped with higher and lower density SNP panels, respectively) for evaluating the accuracies of imputation. For the accuracy of genomic selection, a comparison of direct genetic values (DGV) was made by dividing the data into training and validation sets under a range of imputation scenarios.
Of the three methods compared for imputation, IMPUTE2 outperformed Beagle and fastPhase for almost all scenarios. Higher SNP densities in the test animals, larger reference sets and higher relatedness between test and reference animals increased the accuracy of imputation. 50K specific genotypes were imputed with moderate allelic error rates from 15K (2.85%) and 25K (2.75%) genotypes. Using IMPUTE2, SNP genotypes up to 800K were imputed with low allelic error rate (0.79% genome-wide) from 50K genotypes, and with moderate error rate from 3K (4.78%) and 7K (2.00%) genotypes. The error rate of imputing up to 800K from 3K or 7K was further reduced when an additional middle tier of 50K genotypes was incorporated in a 3-tiered framework. Accuracies of DGV for five production traits using imputed 50K genotypes were close to those obtained with the actual 50K genotypes and higher compared to using 3K or 7K genotypes. The loss in accuracy of DGV was small when most of the training animals also had imputed (50K) genotypes. Additional gains in DGV accuracies were small when SNP densities increased from 50K to imputed 800K.
Population-based genotype imputation can be used to predict and combine genotypes from different low, medium and high-density SNP chips with a high level of accuracy. Imputing genotypes from low-density SNP panels to at least 50K SNP density increases the accuracy of genomic selection.