Evaluation of approaches for identifying population informative markers from high density SNP Chips
1 The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Midlothian EH25 9RG, Scotland, UK
2 Division of Animal Sciences, University of Missouri, Columbia, MO 65211, USA
3 Wildgenes Laboratory, Royal Zoological Society of Scotland, Edinburgh EH12 6TS, Scotland, UK
BMC Genetics 2011, 12:45 doi:10.1186/1471-2156-12-45Published: 13 May 2011
Genetic markers can be used to identify and verify the origin of individuals. Motivation for the inference of ancestry ranges from conservation genetics to forensic analysis. High density assays featuring Single Nucleotide Polymorphism (SNP) markers can be exploited to create a reduced panel containing the most informative markers for these purposes. The objectives of this study were to evaluate methods of marker selection and determine the minimum number of markers from the BovineSNP50 BeadChip required to verify the origin of individuals in European cattle breeds. Delta, Wright's FST, Weir & Cockerham's FST and PCA methods for population differentiation were compared. The level of informativeness of each SNP was estimated from the breed specific allele frequencies. Individual assignment analysis was performed using the ranked informative markers. Stringency levels were applied by log-likelihood ratio to assess the confidence of the assignment test.
A 95% assignment success rate for the 384 individually genotyped animals was achieved with < 80, < 100, < 140 and < 200 SNP markers (with increasing stringency threshold levels) across all the examined methods for marker selection. No further gain in power of assignment was achieved by sampling in excess of 200 SNP markers. The marker selection method that required the lowest number of SNP markers to verify the animal's breed origin was Wright's FST (60 to 140 SNPs depending on the chosen degree of confidence). Certain breeds required fewer markers (< 100) to achieve 100% assignment success. In contrast, closely related breeds require more markers (~200) to achieve > 95% assignment success. The power of assignment success, and therefore the number of SNP markers required, is dependent on the levels of genetic heterogeneity and pool of samples considered.
While all SNP selection methods produced marker panels capable of breed identification, the power of assignment varied markedly among analysis methods. Thus, with effective exploration of available high density genetic markers, a diagnostic panel of highly informative markers can be produced.