Open Access Research article

Comparison of measures of marker informativeness for ancestry and admixture mapping

Lili Ding1, Howard Wiener2, Tilahun Abebe3, Mekbib Altaye1, Rodney CP Go2, Carolyn Kercsmar1, Greg Grabowski1, Lisa J Martin1, Gurjit K Khurana Hershey1, Ranajit Chakorborty4 and Tesfaye M Baye1*

Author Affiliations

1 Cincinnati Children's Hospital Medical Center, Department of Pediatrics, University of Cincinnati, Cincinnati, OH, USA

2 Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL, USA

3 Department of Biology, University of Northern Iowa, Cedar Falls, IA, USA

4 Center for Computational Genomics, Institute of Applied Genetics, Department of Forensic and Investigative Genetics, University of North Texas Health Science Center, Fort Worth, TX, USA

For all author emails, please log on.

BMC Genomics 2011, 12:622  doi:10.1186/1471-2164-12-622

Published: 20 December 2011

Additional files

Additional file 1:

Table S1: Summary statistics of five measures of marker informativeness for CEU and YRI population in the HapMap phase III data. A table of mean, standard deviation, minimum, median, maximum, and lower and upper quartile of the five measures of marker informativeness for CEU and YRI population.

Format: DOCX Size: 14KB Download file

Open Data

Additional file 2:

Figure S1: Distribution of the five measures of marker informativeness for CHB and JPT population from HapMap phase III data. Histograms of the five measures of marker informativeness. Almost all the SNP markers displayed low estimates of genetic informativeness.

Format: DOCX Size: 50KB Download file

Open Data

Additional file 3:

Table S2: Summary statistics of five measures of marker informativeness for CHB and JPT population in the HapMap phase III data. A table of mean, standard deviation, minimum, median, maximum, and lower and upper quartile of the five measures of marker informativeness for CHB and JPT population.

Format: DOCX Size: 14KB Download file

Open Data

Additional file 4:

Table S3: Kappa statistics of the five measures of informativeness as defined by deciles. A table of pair-wise Kappa statistics of the five measures of informativeness.

Format: DOCX Size: 13KB Download file

Open Data

Additional file 5:

Figure S2: Scatter plot of allele frequencies of CEU and YRI population partitioned by the ten groups defined by deciles of each measure of informativeness. The top-left and bottom-right corner represent the most informative SNPs whereas the least informative SNPs reside at the center of the plot.

Format: DOCX Size: 87KB Download file

Open Data

Additional file 6:

Figure S3: Number of AIMs needed to achieve specific accuracies for founder populations. The two founder populations are (a) CEU and YRI and (b) CHB and JPT.

Format: DOCX Size: 82KB Download file

Open Data

Additional file 7:

Figure S4: Inferred population structure for CEU, YRI and ASW population with two clusters and 200 AIMs selected by FIC. A plot of the inferred population structure of CEU, YRI and ASW population. The analysis was done in STRUCTURE and distruct with 2 clusters.

Format: DOCX Size: 33KB Download file

Open Data

Additional file 8:

Figure S5: Estimate of ancestry contribution vs. number of top AIMs for CEU, YRI and ASW population from HapMap phase III data. Top panel: estimate of CEU contribution for CEU population. Middle panel: estimate of YRI contribution for YRI population. Bottom panel: estimate of YRI contribution for ASW population.

Format: DOCX Size: 50KB Download file

Open Data

Additional file 9:

Figure S6: Absolute error in the estimation of mean ancestry contribution for the simulated admixed populations. A plot of absolute error in the admixed population simulated from (a) CEU and YRI and (b) CHB and JPT.

Format: DOCX Size: 68KB Download file

Open Data

Additional file 10:

Table S4: Summary statistics of estimation errors of mean ancestry contribution for ASW population. The estimates were based on 100 random subsets of 20 SNPs from panels consisting of top 1%, 2%, 5%, and 10% of the AIMs for CEU and YRI population. The gold-standard or 'true' ancestry contribution was taken as 78%, estimated by a collection of 3299 AIMs for the CEU and YRI population, all of which were selected as top 10% AIMs by at least one of the five measures.

Format: DOCX Size: 18KB Download file

Open Data

Additional file 11:

Table S5: Summary statistics of estimation errors of mean ancestry contribution for the simulated admixed population from CEU and YRI. The estimates were based on 100 random subsets of 20 SNPs from panels consisting of top 1%, 2%, 5%, and 10% of the AIMs for CEU and YRI population. The true ancestry contribution was 70%.

Format: DOCX Size: 18KB Download file

Open Data

Additional file 12:

Table S6: Summary statistics of estimation errors of mean ancestry contribution for the simulated admixed population from CHB and JPT. The estimates were based on 100 random subsets of 50 SNPs from panels consisting of top 1%, 2%, 5%, and 10% of the AIMs for CEU and YRI population. The true ancestry contribution was 72%.

Format: DOCX Size: 17KB Download file

Open Data

Additional file 13:

Table S7: Overlap of SNP markers between measures. Diagonal (bolded): Number of SNPs genotyped in both populations and satisfying the filtering criteriaa. Upper-triangle: Overlap for the SNP markers. Lower-triangle: Overlap for the top 500 ranked SNP markers.

Format: DOCX Size: 14KB Download file

Open Data

Additional file 14:

Figure S7: Scatter plot of allele frequency difference between CEU and YRI population using current cutoff values for each measure. Markers in red exceeded the cutoff for the measure of informativeness. Similar patterns were observed between FST and In. Delta yielded the largest AIMs panel and included a large number of loci not included by any of the remaining four methods. SIC gave the smallest AIMs panel.

Format: DOCX Size: 21KB Download file

Open Data

Additional file 15:

Table S8: FIC - Sensitivity analysis of proportion of ancestry contribution on the selection of AIMs. For a pair of proportions of ancestry contribution (m and m'), we examined overlap patterns between the two top n% AIM panels selected using m and m' in the computation of FIC. Overlap patterns were presented by 11: AIMs selected by both panels; 10: AIMs selected by panel one (m) but not panel two (m'); and 01: AIMs selected by panel two (m') but not panel one (m). Frequency and percentage of each overlap pattern were reported for top 1%, 5%, 10%, and 20% AIMs. Proportion of ancestry contribution considered included 0.1, 0.2, 0.3, 0.4, and 0.5.

Format: DOCX Size: 18KB Download file

Open Data

Additional file 16:

Table S9: SIC - Sensitivity analysis of proportion of ancestry contribution on the selection of AIMs. For a pair of proportions of ancestry contribution (m and m'), we examined overlap patterns between the two top n% AIM panels selected using m and m' in the computation of SIC. Overlap patterns were presented by 11: AIMs selected by both panels; 10: AIMs selected by panel one (m) but not panel two (m'); and 01: AIMs selected by panel two (m') but not panel one (m). Frequency and percentage of each overlap pattern were reported for top 1%, 5%, 10%, and 20% AIMs. Proportion of ancestry contribution considered included 0.1, 0.2, 0.3, 0.4, and 0.5.

Format: DOCX Size: 18KB Download file

Open Data