Errors in estimating ancestral allele frequencies lead to bias in estimating ancestry fractions (Q), with many individuals ascribed too much admixture. The plot shows an estimate of the relationship between the true ancestry fraction qi1 (fraction of ancestry attributed to population 1) and the resulting estimate as determined via a nonparametric regression (LOESS) model fitted to the results from analyses of 100 simulated datasets. Reference individuals are excluded from the plots and regression analyses. The dotted line y = x is tracked closely by the conditional mean of supervised estimates, suggesting little bias. However, in panel (a) (simulations with FST = .01) the conditional mean of the unsupervised estimates deviates substantially, exhibiting an upward bias for low qi1 and a downward bias for high qi1. The bias is mitigated using simulations with FST = .05, as shown in panel (b), or by using a larger number of markers (J = 300, 000, not shown).
Alexander and Lange BMC Bioinformatics 2011 12:246 doi:10.1186/1471-2105-12-246