Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Research article

ETHNOPRED: a novel machine learning method for accurate continental and sub-continental ancestry identification and population stratification correction

Mohsen Hajiloo12, Yadav Sapkota35, John R Mackey45, Paula Robson56, Russell Greiner12 and Sambasivarao Damaraju35*

Author Affiliations

1 Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada

2 Alberta Innovates Centre for Machine Learning, University of Alberta, Edmonton, Alberta, Canada

3 Department of Laboratory Medicine and Pathology, University of Alberta, Edmonton, Alberta, Canada

4 Department of Oncology, University of Alberta, Edmonton, Alberta, Canada

5 Cancer Care, Alberta Health Services, Edmonton, Alberta, Canada

6 Department of Agricultural, Food and Nutritional Sciences, University of Alberta, Edmonton, Alberta, Canada

For all author emails, please log on.

BMC Bioinformatics 2013, 14:61  doi:10.1186/1471-2105-14-61

Published: 22 February 2013

Additional files

Additional file 1:

Appendix A. 10-fold cross validation accuracy of individual decision trees and ensemble of disjoint decision trees of variable size on continental and sub-continental classification problems; in this Excel© file, you can find the relevant accuracies for each problem on a separate sheet. In each sheet the first column specifies the decision tree index, the second column specifies, the accuracy of the individual decision trees, and the third column specifies the accuracy of the ensemble of disjoint decision trees.

Format: XLSX Size: 26KB Download file

Open Data

Additional file 2:

Appendix B. ETHNOPRED generated classifier statistics considering accuracy metric and robustness to missing values metrics in different continental and sub-continental population classification problems; this Excel© file presents statistical information of each classification problem in a separate row.

Format: XLSX Size: 208KB Download file

Open Data

Additional file 3:

Appendix C. Rule-based format of the continental ancestry identification model.

Format: DOCX Size: 21KB Download file

Open Data

Additional file 4:

Appendix D. Summary statistics of SNPs used by ETHNOPRED method to tackle different continental and sub-continental population classification problems under accuracy satisfaction condition; in this excel file, you can find the relevant summary statistics on SNPs used by our method for each problem on a separate sheet.

Format: XLSX Size: 184KB Download file

Open Data

Additional file 5:

Appendix E. Summary statistics of SNPs used by ETHNOPRED method to tackle different continental and sub-continental population classification problems under robustness to missing values satisfaction condition; in this excel file, you can find the relevant summary statistics on SNPs used by our method for each problem on a separate sheet.

Format: XLSX Size: 819KB Download file

Open Data

Additional file 6:

Appendix F. ETHNOPRED’s output file for a dataset of 696 subjects selected from a breast cancer susceptibility study in Caucasian women of Alberta, Canada [45].

Format: XLSX Size: 37KB Download file

Open Data

Additional file 7:

Appendix G. Comparison of self-declared lineage information, EIGENSTRAT’s result and ETHNOPRED’s result on 348 controls selected for a breast cancer susceptibility study in Caucasian women of Alberta, Canada [45].

Format: XLSX Size: 43KB Download file

Open Data