Characterization of a likelihood based method and effects of markers informativeness in evaluation of admixture and population group assignment
1 Yale University School of Medicine, Department of Psychiatry, New Haven, CT, USA
2 VA CT Healthcare Center, West Haven, CT, USA
3 Yale University School of Medicine, Departments of Epidemiology and Public Health, and Genetics, New Haven, CT, USA
4 University of Connecticut Health Center, Farmington, CT, USA
BMC Genetics 2005, 6:50 doi:10.1186/1471-2156-6-50Published: 14 October 2005
Detection and evaluation of population stratification are crucial issues in the conduct of genetic association studies. Statistical approaches useful for understanding these issues have been proposed; these methods rely on information gained from genotyping sets of markers that reflect population ancestry. Before using these methods, a set of markers informative for differentiating population genetic substructure (PGS) is necessary. We have previously evaluated the performance of a Bayesian clustering method implemented in the software STRUCTURE in detecting PGS with a particular informative marker set. In this study, we implemented a likelihood based method (LBM) in evaluating the informativeness of the same selected marker panel, with respect to assessing potential for stratification in samples of European Americans (EAs) and African Americans (AAs), that are known to be admixed. LBM calculates the probability of a set of genotypes based on observations in a reference population with known specific allele frequencies for each marker, assuming Hardy Weinberg equilibrium (HWE) for each marker and linkage equilibrium among markers.
In EAs, the assignment accuracy by LBM exceeded 99% using the most efficient marker FY, and reached perfect assignment accuracy using the 10 most efficient markers excluding FY. In AAs, the assignment accuracy reached 96.4% using FY, and >95% when using at least the 9 most efficient markers. The comparison of the observed and reference allele frequencies (which were derived from previous publications and public databases) shows that allele frequencies observed in EAs matched the reference group more accurately than allele frequencies observed in AAs. As a result, the LBM performed better in EAs than AAs, as might be expected given the dependence of LBMs on prior knowledge of allele frequencies. Performance was not dependent on sample size.
The performance of the LBM depends on the efficiency and number of markers, and depends greatly on how representative the available reference allele frequencies are for those of the population being assigned. This method is of value when the parental population is known and relevant allele frequencies are available.