Open Access Highly Accessed Research article

High-throughput SNP discovery through deep resequencing of a reduced representation library to anchor and orient scaffolds in the soybean whole genome sequence

David L Hyten1*, Steven B Cannon2, Qijian Song13, Nathan Weeks2, Edward W Fickus1, Randy C Shoemaker2, James E Specht4, Andrew D Farmer5, Gregory D May5 and Perry B Cregan1

Author affiliations

1 Soybean Genomics and Improvement Laboratory, U.S. Department of Agriculture, Agricultural Research Service, Beltsville, MD 20705, USA

2 Department of Agronomy, U.S. Department of Agriculture, Agricultural Research Service, Iowa State University, Ames, IA 50011, USA

3 Department Plant Science and Landscape Architecture, University of Maryland, College Park, MD 20742, USA

4 Department of Agronomy and Horticulture, University of Nebraska Lincoln, Lincoln, Nebraska, NE 68583, USA

5 National Center for Genome Resources, Santa Fe, NM 87505, USA

For all author emails, please log on.

Citation and License

BMC Genomics 2010, 11:38  doi:10.1186/1471-2164-11-38

Published: 15 January 2010



The Soybean Consensus Map 4.0 facilitated the anchoring of 95.6% of the soybean whole genome sequence developed by the Joint Genome Institute, Department of Energy, but its marker density was only sufficient to properly orient 66% of the sequence scaffolds. The discovery and genetic mapping of more single nucleotide polymorphism (SNP) markers were needed to anchor and orient the remaining genome sequence. To that end, next generation sequencing and high-throughput genotyping were combined to obtain a much higher resolution genetic map that could be used to anchor and orient most of the remaining sequence and to help validate the integrity of the existing scaffold builds.


A total of 7,108 to 25,047 predicted SNPs were discovered using a reduced representation library that was subsequently sequenced by the Illumina sequence-by-synthesis method on the clonal single molecule array platform. Using multiple SNP prediction methods, the validation rate of these SNPs ranged from 79% to 92.5%. A high resolution genetic map using 444 recombinant inbred lines was created with 1,790 SNP markers. Of the 1,790 mapped SNP markers, 1,240 markers had been selectively chosen to target existing unanchored or un-oriented sequence scaffolds, thereby increasing the amount of anchored sequence to 97%.


We have demonstrated how next generation sequencing was combined with high-throughput SNP detection assays to quickly discover large numbers of SNPs. Those SNPs were then used to create a high resolution genetic map that assisted in the assembly of scaffolds from the 8× whole genome shotgun sequences into pseudomolecules corresponding to chromosomes of the organism.