Open Access Research article

Genome-wide SNP discovery in walnut with an AGSNP pipeline updated for SNP discovery in allogamous organisms

Frank M You12*, Karin R Deal1, Jirui Wang1, Monica T Britton3, Joseph N Fass3, Dawei Lin3, Abhaya M Dandekar1, Charles A Leslie1, Mallikarjuna Aradhya4, Ming-Cheng Luo1 and Jan Dvorak1*

Author affiliations

1 Department of Plant Sciences, University of California, Davis, CA 95616, USA

2 Cereal Research Centre, Agriculture and Agri-Food Canada, Winnipeg, MB R3T 2M9, Canada

3 Genome Center Bioinformatics Core Facility, University of California, Davis, CA, 95616, USA

4 Germplasm Repository, USDA-ARS, Davis, CA, 95616, USA

For all author emails, please log on.

Citation and License

BMC Genomics 2012, 13:354  doi:10.1186/1471-2164-13-354

Published: 31 July 2012



A genome-wide set of single nucleotide polymorphisms (SNPs) is a valuable resource in genetic research and breeding and is usually developed by re-sequencing a genome. If a genome sequence is not available, an alternative strategy must be used. We previously reported the development of a pipeline (AGSNP) for genome-wide SNP discovery in coding sequences and other single-copy DNA without a complete genome sequence in self-pollinating (autogamous) plants. Here we updated this pipeline for SNP discovery in outcrossing (allogamous) species and demonstrated its efficacy in SNP discovery in walnut (Juglans regia L.).


The first step in the original implementation of the AGSNP pipeline was the construction of a reference sequence and the identification of single-copy sequences in it. To identify single-copy sequences, multiple genome equivalents of short SOLiD reads of another individual were mapped to shallow genome coverage of long Sanger or Roche 454 reads making up the reference sequence. The relative depth of SOLiD reads was used to filter out repeated sequences from single-copy sequences in the reference sequence. The second step was a search for SNPs between SOLiD reads and the reference sequence. Polymorphism within the mapped SOLiD reads would have precluded SNP discovery; hence both individuals had to be homozygous. The AGSNP pipeline was updated here for using SOLiD or other type of short reads of a heterozygous individual for these two principal steps. A total of 32.6X walnut genome equivalents of SOLiD reads of vegetatively propagated walnut scion cultivar ‘Chandler’ were mapped to 48,661 ‘Chandler’ bacterial artificial chromosome (BAC) end sequences (BESs) produced by Sanger sequencing during the construction of a walnut physical map. A total of 22,799 putative SNPs were initially identified. A total of 6,000 Infinium II type SNPs evenly distributed along the walnut physical map were selected for the construction of an Infinium BeadChip, which was used to genotype a walnut mapping population having ‘Chandler’ as one of the parents. Genotyping results were used to adjust the filtering parameters of the updated AGSNP pipeline. With the adjusted filtering criteria, 69.6% of SNPs discovered with the updated pipeline were real and could be mapped on the walnut genetic map. A total of 13,439 SNPs were discovered by BES re-sequencing. BESs harboring SNPs were in 677 FPC contigs covering 98% of the physical map of the walnut genome.


The updated AGSNP pipeline is a versatile SNP discovery tool for a high-throughput, genome-wide SNP discovery in both autogamous and allogamous species. With this pipeline, a large set of SNPs were identified in a single walnut cultivar.

BAC; Physical map; BAC end sequence; Infinium; Single nucleotide polymorphism; Genome sequence; SOLiD; Walnut; AGSNP