Methods of tagSNP selection and other variables affecting imputation accuracy in swine
1 Department of Animal Science, Michigan State University, East Lansing, MI, USA
2 Department of Fisheries & Wildlife, Michigan State University, East Lansing, MI, USA
3 The Maschhoffs, Carlyle, IL, USA
4 National Swine Registry, West Lafayette, IN, USA
5 Bovine Functional Genomics Laboratory, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD, USA
BMC Genetics 2013, 14:8 doi:10.1186/1471-2156-14-8Published: 21 February 2013
Genotype imputation is a cost efficient alternative to use of high density genotypes for implementing genomic selection. The objective of this study was to investigate variables affecting imputation accuracy from low density tagSNP (average distance between tagSNP from 100kb to 1Mb) sets in swine, selected using LD information, physical location, or accuracy for genotype imputation. We compared results of imputation accuracy based on several sets of low density tagSNP of varying densities and selected using three different methods. In addition, we assessed the effect of varying size and composition of the reference panel of haplotypes used for imputation.
TagSNP density of at least 1 tagSNP per 340kb (∼7000 tagSNP) selected using pairwise LD information was necessary to achieve average imputation accuracy higher than 0.95. A commercial low density (9K) tagSNP set for swine was developed concurrent to this study and an average accuracy of imputation of 0.951 based on these tagSNP was estimated. Construction of a haplotype reference panel was most efficient when these haplotypes were obtained from randomly sampled individuals. Increasing the size of the original reference haplotype panel (128 haplotypes sampled from 32 sire/dam/offspring trios phased in a previous study) led to an overall increase in imputation accuracy (IA = 0.97 with 512 haplotypes), but was especially useful in increasing imputation accuracy of SNP with MAF below 0.1 and for SNP located in the chromosomal extremes (within 5% of chromosome end).
The new commercially available 9K tagSNP set can be used to obtain imputed genotypes with high accuracy, even when imputation is based on a comparably small panel of reference haplotypes (128 haplotypes). Average imputation accuracy can be further increased by adding haplotypes to the reference panel. In addition, our results show that randomly sampling individuals to genotype for the construction of a reference haplotype panel is more cost efficient than specifically sampling older animals or trios with no observed loss in imputation accuracy. We expect that the use of imputed genotypes in swine breeding will yield highly accurate predictions of GEBV, based on the observed accuracy and reported results in dairy cattle, where genomic evaluation of some individuals is based on genotypes imputed with the same accuracy as our Yorkshire population.