Open Access Research article

Characterization of single-nucleotide variation in Indian-origin rhesus macaques (Macaca mulatta)

Gloria L Fawcett12, Muthuswamy Raveendran1, David Rio Deiros1, David Chen1, Fuli Yu12, Ronald Alan Harris2, Yanru Ren1, Donna M Muzny12, Jeffrey G Reid12, David A Wheeler12, Kimberly C Worley12, Steven E Shelton4, Ned H Kalin45, Aleksandar Milosavljevic2, Richard Gibbs12 and Jeffrey Rogers123*

Author Affiliations

1 Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA

2 Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA

3 Southwest National Primate Research Center, San Antonio, Texas 78245, USA

4 Department of Psychiatry, the HealthEmotions Research Institution, University of Wisconsin-Madison, Madison, Wisconsin, 53719, USA

5 Department of Psychology, Waisman Laboratory for Brain Imaging and Behavior, University of Wisconsin-Madison, Madison, Wisconsin, 53719, USA

For all author emails, please log on.

BMC Genomics 2011, 12:311  doi:10.1186/1471-2164-12-311

Published: 13 June 2011

Additional files

Additional file 1:

Figure S1, Additional file 1. Validation by data source. Validation efficiencies cannot be determined using existing validated SNP data sets, as only 777 SNPs are currently available in dbSNP for rhesus macaque, most of which are polymorphic between subspecies (Chinese to Indian) rather than within a subspecies (Indian to Indian). We calculated the proportions of unique SNPs validated within each pairwise comparison of all 15 data sets. On average, we validated ~35% of the potential SNPs from each of the resequencing data sets. r02120 displays much higher rates of validation (~65%) compared to all other data sets, likely due to technical issues resulting in much lower coverage. The Sub-species comparison data set exhibited similar validation rates (~40%) to the resequenced data sets, which may be explained by the fact that this data set contained many more SNPs than the other non-resequenced data sets. The smaller data sets (MamuSNP, ENCODE, dbSNP, and all e-genotyping data sets) all displayed significantly lower validation rates (~10%) due both to these data sets being generated using both Chinese and Indian origin animals rather than only Indian animals and to the small number of SNPs in each of these sets.

Format: PDF Size: 1.1MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 2:

Figure S2, Additional file 2. Sanger read coverage for validated SNPs. All SNPs that were found in common between r1766, r02120, and 17573 (306,782, Figure 1B) were tested for Sanger read coverage using the 17573 Sanger data by looking for the same location and allele SNP call in the Sanger output (red bars). Average genome-wide Sanger read coverage was 5.2X. We analyzed an identical number of randomly selected SNPs that were detected in only two of the three resequenced animals (blue bars) to test if there were differences in the distribution of read coverage for the two SNP data sets. The distributions are indistinguishable, however, and the proportions of tested SNPs exhibiting 11+ read coverage are not statistically different (3.8% for SNPs in all three animals, 3.6% for SNPs detected in only two animals).

Format: PDF Size: 144KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 3:

Table S1, Additional file 3. SOLiD sequencing statistics. When using our new SOLiD data or prior data from the literature, our validation procedure only validated two alleles for any location, regardless of the data sets being compared. However, for some SNPs, neither of the two validated alleles matched the published reference allele. Some (8) of these are likely due to Indian-origin vs. Chinese-origin differences. For the vast majority, however, there are two possibilities, each of which explains an undefinable proportion of these SNPs: (1) a proportion of the reference allele (Sanger) calls are wrong, and (2) a small proportion of the apparent three allele SNPs represent true SNPs with three valid alleles in Indian-origin rhesus macaques.

Format: XLS Size: 23KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 4:

Figure S3, Additional file 4. dbSNP concordance of e-genotype. One human sample of SOLiD data (26X coverage) from the 1000 Genomes pilot data was analyzed for the ~1 million SNPs identified in the project for that sample. Color coding for the bars as well as the lines is as follows: good homozygous calls (blue), heterozygotes called as homozygotes (yellow), homozygotes called as the wrong homozygote (grey), good heterozygote calls (green), and erroneously called heterozygotes (red). The total miscall rate was ~2.6%. Excluding very high and very low coverage errors (shown outside of dotted lines, due to bad heterozygous SNP calls from repetitive regions or coverage too low to detect heterozygotes, respectively) the miscall rate was determined to be 1.6%. Overall probe coverage was reduced to 8.2X due to the extremely stringent requirement of exact matching of probes for the full 31 bp probe length. Percentages of homozygous or heterozygous calls that fit into each category of good or error calls were linearly graphed relative to probe coverage, indicating that errors were much more likely at very high and very low probe coverage, while good calls were most likely in intermediate coverage ranges.

Format: PDF Size: 99KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 5:

Table S2, Additional file 5. Part 1 of dbSNP ss numbers for validated rhesus macaque SNPs. The first part of six of a table listing of all of the ss numbers from submission of the validated rhesus macaque SNPs to dbSNP. The ss numbers are not contiguous throughout the full set of 3,038,166 SNPs.

Format: TXT Size: 12.9MB Download file

Open Data

Additional file 6:

Table S2, Additional file 6. Part 2 of dbSNP ss numbers for validated rhesus macaque SNPs. The second part of six of a table listing of all of the ss numbers from submission of the validated rhesus macaque SNPs to dbSNP. The ss numbers are not contiguous throughout the full set of 3,038,166 SNPs.

Format: TXT Size: 12.9MB Download file

Open Data

Additional file 7:

Table S2, Additional file 7. Part 3 of dbSNP ss numbers for validated rhesus macaque SNPs. The third part of six of a table listing of all of the ss numbers from submission of the validated rhesus macaque SNPs to dbSNP. The ss numbers are not contiguous throughout the full set of 3,038,166 SNPs.

Format: TXT Size: 12.9MB Download file

Open Data

Additional file 8:

Table S2, Additional file 8. Part 4 of dbSNP ss numbers for validated rhesus macaque SNPs. The fourth part of six of a table listing of all of the ss numbers from submission of the validated rhesus macaque SNPs to dbSNP. The ss numbers are not contiguous throughout the full set of 3,038,166 SNPs.

Format: TXT Size: 12.9MB Download file

Open Data

Additional file 9:

Table S2, Additional file 9. Part 5 of dbSNP ss numbers for validated rhesus macaque SNPs. The fifth part of six of a table listing of all of the ss numbers from submission of the validated rhesus macaque SNPs to dbSNP. The ss numbers are not contiguous throughout the full set of 3,038,166 SNPs.

Format: TXT Size: 12.9MB Download file

Open Data

Additional file 10:

Table S2, Additional file 10. Part 6 of dbSNP ss numbers for validated rhesus macaque SNPs. The sixth part of six of a table listing of all of the ss numbers from submission of the validated rhesus macaque SNPs to dbSNP. The ss numbers are not contiguous throughout the full set of 3,038,166 SNPs.

Format: TXT Size: 13.9MB Download file

Open Data