Figure S1, Additional file 1. Validation by data source. Validation efficiencies cannot be determined using existing validated SNP data sets, as only 777 SNPs are currently available in dbSNP for rhesus macaque, most of which are polymorphic between subspecies (Chinese to Indian) rather than within a subspecies (Indian to Indian). We calculated the proportions of unique SNPs validated within each pairwise comparison of all 15 data sets. On average, we validated ~35% of the potential SNPs from each of the resequencing data sets. r02120 displays much higher rates of validation (~65%) compared to all other data sets, likely due to technical issues resulting in much lower coverage. The Sub-species comparison data set exhibited similar validation rates (~40%) to the resequenced data sets, which may be explained by the fact that this data set contained many more SNPs than the other non-resequenced data sets. The smaller data sets (MamuSNP, ENCODE, dbSNP, and all e-genotyping data sets) all displayed significantly lower validation rates (~10%) due both to these data sets being generated using both Chinese and Indian origin animals rather than only Indian animals and to the small number of SNPs in each of these sets.

