Penalized estimation can reduce the bias in ancestry estimates that appears for small marker sets or closely related ancestral populations. We applied penalized estimation to the simulated dataset of 10,000 SNP markers from admixed individuals from two populations differentiated by FST = .01. Panel (a) shows that 5-fold cross-validation selects λ = 5 as the optimal strength of penalization. The results of penalization with λ = 5 are compared, in panel (b), with the maximum likelihood (unsupervised) estimates and with the supervised estimates, all visualized via nonparametric regression as in Figure 2. Reference individuals are excluded from the regression models.
Alexander and Lange BMC Bioinformatics 2011 12:246 doi:10.1186/1471-2105-12-246