Open Access Highly Accessed Open Badges Research article

A genome-wide association study of seed protein and oil content in soybean

Eun-Young Hwang1, Qijian Song2, Gaofeng Jia2, James E Specht3, David L Hyten24, Jose Costa15 and Perry B Cregan2*

Author Affiliations

1 Department of Plant Science and Landscape Architecture, University of Maryland, College Park, MD 20742, USA

2 USDA, Agricultural Research Service, Soybean Genomics and Improvement Lab, Beltsville, MD 20705, USA

3 Agronomy & Horticulture Department, University of Nebraska, Lincoln, NE 68583, USA

4 Present address: DuPont Pioneer, 8305 NW 62nd Ave., PO Box 7060, Johnston, IA 50131, USA

5 Present address: USDA-ARS, Crop Production and Protection, GWCC-BLTSVL, Beltsville, MD 20705, USA

For all author emails, please log on.

BMC Genomics 2014, 15:1  doi:10.1186/1471-2164-15-1

Published: 2 January 2014



Association analysis is an alternative to conventional family-based methods to detect the location of gene(s) or quantitative trait loci (QTL) and provides relatively high resolution in terms of defining the genome position of a gene or QTL. Seed protein and oil concentration are quantitative traits which are determined by the interaction among many genes with small to moderate genetic effects and their interaction with the environment. In this study, a genome-wide association study (GWAS) was performed to identify quantitative trait loci (QTL) controlling seed protein and oil concentration in 298 soybean germplasm accessions exhibiting a wide range of seed protein and oil content.


A total of 55,159 single nucleotide polymorphisms (SNPs) were genotyped using various methods including Illumina Infinium and GoldenGate assays and 31,954 markers with minor allele frequency >0.10 were used to estimate linkage disequilibrium (LD) in heterochromatic and euchromatic regions. In euchromatic regions, the mean LD (r2) rapidly declined to 0.2 within 360 Kbp, whereas the mean LD declined to 0.2 at 9,600 Kbp in heterochromatic regions. The GWAS results identified 40 SNPs in 17 different genomic regions significantly associated with seed protein. Of these, the five SNPs with the highest associations and seven adjacent SNPs were located in the 27.6-30.0 Mbp region of Gm20. A major seed protein QTL has been previously mapped to the same location and potential candidate genes have recently been identified in this region. The GWAS results also detected 25 SNPs in 13 different genomic regions associated with seed oil. Of these markers, seven SNPs had a significant association with both protein and oil.


This research indicated that GWAS not only identified most of the previously reported QTL controlling seed protein and oil, but also resulted in narrower genomic regions than the regions reported as containing these QTL. The narrower GWAS-defined genome regions will allow more precise marker-assisted allele selection and will expedite positional cloning of the causal gene(s).

GWAS; Glycine max; Seed protein and oil content; Single nucleotide polymorphism; Linkage disequilibrium