Table 7

Summary of the sub-continental classification problems results
Sub-continental problem Number of subjects, split Number of SNPs Baseline DT1 (Number of SNPs), Accuracy Minimal Number of DTs (Number of SNPs), Accuracy Number of Robust DTs (Number of SNPs)
European 267, 882895 61.8% 1 (10), 79.0% ± 5.6% 3 (31), 86.6% ± 2.4% 15 (180)
CEU: 165
TSI: 102
East Asian 250, 892833 54.8% 1 (12), 74.4% ± 7.9% 39 (502), 95.6% ± 3.9% 67 (877)
CHB: 137
JPT: 113
African 497, 616597 40.8% 1 (23), 66.2% ± 5.3% 21 (526), 95.6% ± 2.1% 157 (4236)
LWK:110
MKK: 184
YRI: 203
North American 548, 526394 30.1% 1 (19), 82.7% ± 5.4% 11 (242), 98.4% ± 2.0% 70 (1643)
ASW: 87
CEU: 165
CHD: 109
GIH: 101
MXL: 86
Kenyan 294, 781061 62.6% 1 (11), 79.2% ± 3.5% 25 (271), 95.9% ± 1.5% 31 (341)
LWK: 110
MKK: 184
Chinese 246, 829364 55.7% 1 (15), 47.2% ± 9.1% - (−), ≤55.7% -  (−)
CHB: 137
CHD: 109

This table summarizes the result of our studies on various sub-continental classification problems. The “Number of Subjects, Split” column shows the total number of subjects, followed by the list of (ethnic-group; number) pairs, giving the name of each subgroups and its size here. The “Number of SNPs” column gives the number of SNPs used for this study. The “Baseline” column gives the baseline accuracy of just using the majority class. The “DT1 (Number of SNPs), Accuracy” column provides the number of SNPs in the first decision tree, and its estimated 10-fold cross-validation accuracy. The “Minimal Number of DTs (Number of SNPs), Accuracy” column gives the minimal number of disjoint decision trees required to achieve the highest accuracy, and the number of SNPs involved, in these trees. The “Number of Robust DTs (Number of SNPs)” column gives the number of decision trees required to achieve robustness and the number of SNPs involved.

Hajiloo et al.

Hajiloo et al. BMC Bioinformatics 2013 14:61   doi:10.1186/1471-2105-14-61

Open Data