Oligonucleotide array discovery of polymorphisms in cultivated tomato (Solanum lycopersicum L.) reveals patterns of SNP variation associated with breeding
1 Department of Horticulture and Crop Science, The Ohio State University, Ohio Agricultural Research and Development Center, 1680 Madison Ave, Wooster, OH 44691, USA
2 Syngenta Biotechnology, Inc, 3054 East Cornwallis Road, Research Triangle Park, NC 27709, USA
BMC Genomics 2009, 10:466 doi:10.1186/1471-2164-10-466Published: 9 October 2009
Cultivated tomato (Solanum lycopersicum L.) has narrow genetic diversity that makes it difficult to identify polymorphisms between elite germplasm. We explored array-based single feature polymorphism (SFP) discovery as a high-throughput approach for marker development in cultivated tomato.
Three varieties, FL7600 (fresh-market), OH9242 (processing), and PI114490 (cherry) were used as a source of genomic DNA for hybridization to oligonucleotide arrays. Identification of SFPs was based on outlier detection using regression analysis of normalized hybridization data within a probe set for each gene. A subset of 189 putative SFPs was sequenced for validation. The rate of validation depended on the desired level of significance (α) used to define the confidence interval (CI), and ranged from 76% for polymorphisms identified at α ≤ 10-6 to 60% for those identified at α ≤ 10-2. Validation percentage reached a plateau between α ≤ 10-4 and α ≤ 10-7, but failure to identify known SFPs (Type II error) increased dramatically at α ≤ 10-6. Trough sequence validation, we identified 279 SNPs and 27 InDels in 111 loci. Sixty loci contained ≥ 2 SNPs per locus. We used a subset of validated SNPs for genetic diversity analysis of 92 tomato varieties and accessions. Pairwise estimation of θ (Fst) suggested significant differentiation between collections of fresh-market, processing, vintage, Latin American (landrace), and S. pimpinellifolium accessions. The fresh-market and processing groups displayed high genetic diversity relative to vintage and landrace groups. Furthermore, the patterns of SNP variation indicated that domestication and early breeding practices have led to progressive genetic bottlenecks while modern breeding practices have reintroduced genetic variation into the crop from wild species. Finally, we examined the ratio of non-synonymous (Ka) to synonymous substitutions (Ks) for 20 loci with multiple SNPs (≥ 4 per locus). Six of 20 loci showed ratios of Ka/Ks ≥ 0.9.
Array-based SFP discovery was an efficient method to identify a large number of molecular markers for genetics and breeding in elite tomato germplasm. Patterns of sequence variation across five major tomato groups provided insight into to the effect of human selection on genetic variation.