Open Access Highly Accessed Research article

Effect of sample stratification on dairy GWAS results

Li Ma15, George R Wiggans2, Shengwen Wang1, Tad S Sonstegard3, Jing Yang1, Brian A Crooker1, John B Cole2, Curtis P Van Tassell23, Thomas J Lawlor4 and Yang Da1*

Author Affiliations

1 Department of Animal Science, University of Minnesota, St. Paul, Minnesota, USA

2 Animal Improvement Programs Laboratory, Agricultural Research Service, USDA, Beltsville, Maryland, USA

3 Bovine Functional Genomics Laboratory, Agricultural Research Service, USDA, Beltsville, Maryland, USA

4 Holstein Association USA, Brattleboro, Vermont, USA

5 Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA

For all author emails, please log on.

BMC Genomics 2012, 13:536  doi:10.1186/1471-2164-13-536

Published: 6 October 2012

Additional files

Additional file 1:

Figure S1. Multidimensional scaling (MDS) plots of SNP genotypes of 1,654 contemporary Holstein cows by chromosome. C1 = dimension 1, C2 = dimension 2. Left column: C1 and C2 values were calculated using 1,654 contemporary cows. Right column: C1 and C2 values were calculated using 2,366 Holstein cattle, including the University of Minnesota Holstein control line that remained unselected since 1964.

Format: PDF Size: 2.9MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 2:

Figure S2. Pedigree of the 1,654 contemporary cows tracing back to ancestors born in 1930’s (approximately 10–15 generations). Circles in gold color are the 1,654 cows used in the genome-wide association analysis. The pedigree shows that all 1,654 cows are related.

Format: PDF Size: 2.2MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 3:

Figure S3. Overlap between genome stratification and half-sib family structure. C1 = dimension 1, C2 = dimension 2. Left column: C1 and C2 values were calculated using 1,654 contemporary Holstein cows. Right column: C1 and C2 values were calculated using 2,366 Holstein cattle, including the University of Minnesota Holstein control line that remained unselected since 1964.

Format: PDF Size: 3MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 4:

Figure S4. Overlap between genome stratification and phenotypic stratification of 31 traits. C1 = dimension 1, C2 = dimension 2. Left column: C1 and C2 values were calculated using 1,654 contemporary Holstein cows. Right column: C1 and C2 values were calculated using 2,366 contemporary and historical Holstein cattle, including the University of Minnesota Holstein control line that remained unselected since 1964. ‘Top 200’ are the 200 cows with the highest PTA values for the trait, ‘Bottom 200’ are the 200 cows with the lowest PTA values for the trait, and ‘Other’ are cows with PTA values between top 200 and bottom 200.

Format: PDF Size: 2.9MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 5:

Figure S5. Overlap between genome stratification and phenotypic stratification for chromosome 1 and the X chromosome. Column 1: chromosome 1; Column 2: X chromosome; C1 and C2 values were calculated using 1,654 contemporary Holstein cows. Column 3: chromosome 1; Column 4: X chromosome; C1 and C2 values were calculated using 2,366 Holstein cattle, including the University of Minnesota Holstein control line that remained unselected since 1964. C1 = dimension 1, C2 = dimension 2; ‘Top 200’ are the 200 cows with the highest PTA values for the trait, ‘Bottom 200’ the 200 cows with the lowest PTA values for the trait, and ‘Other’ are cows with PTA values between top 200 and bottom 200.

Format: PDF Size: 7.3MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 6:

Figure S6. Global view of P-values of 45,878 SNP effects per trait for 31 production, health, reproduction and body conformation traits by three methods for stratification correction. MY, milk yield; FY, fat yield; PY, protein yield; FPC, fat percentage; PPC, protein percentage; PL, productive life; SCS, somatic cell score; DPR, daughter pregnancy rate; SCE, service-sire calving ease; DCE, daughter calving ease; SSB, service-sire stillbirth; DSB, daughter stillbirth; NM, net merit; STA, stature; STR, strength; BD, body depth; DF, dairy form; RA, rump angle; RW, rump width; FUA, fore udder attachment; RUH, rear udder height; UD, udder depth; UC, udder cleft; FTP, front teat placement; RTP, rear teat placement; TL, teat length; FA, foot angle; RLS, rear legs (side view); RLR, rear legs (rear view); FL, feet and legs; FS, final score. Yellow triangle indicates confirmation among all for methods for stratification correction.

Format: PDF Size: 14.7MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 7:

Table S1. (Excel file) Output file of top 100 effects on 31 dairy traits by EMMAX tests. Sheet 1: Results of EMMAX using identify by descent (IBS) among all individuals. Sheet 2: Results of EMMAX using the Balding-Nichols kinship matrix among all individuals. Chr30 is the X chromosome, and Chr32 indicates markers with unknown chromosome locations. MY, milk yield; FY, fat yield; PY, protein yield; FPC, fat percentage; PPC, protein percentage; SCS, somatic cell score; DPR, daughter pregnancy rate; PL, productive life; SCE, service-sire calving ease; DCE, daughter calving ease; SSB, service-sire stillbirth; DSB, daughter stillbirth; NM, net merit; STA, stature; STR, strength; BD, body depth; RW, rump width; DF, dairy form; RA, rump angle; FUA, fore udder attachment; RUH, rear udder height; UD, udder depth; UC, udder cleft; FTP, front teat placement; RTP, rear teat placement; TL, teat length; FA, foot angle; RLS, rear legs (side view); RLR, rear legs (rear view); FL, feet and legs; FS, final score.

Format: XLSX Size: 1.9MB Download file

Open Data

Additional file 8:

Table S2. (Excel file) Output file of top 100 effects on 31 dairy traits by generalized least squares (GLS) tests. Chr30 is the X chromosome, and Chr32 indicates markers with unknown chromosome locations. MY, milk yield; FY, fat yield; PY, protein yield; FPC, fat percentage; PPC, protein percentage; SCS, somatic cell score; DPR, daughter pregnancy rate; PL, productive life; SCE, service-sire calving ease; DCE, daughter calving ease; SSB, service-sire stillbirth; DSB, daughter stillbirth; NM, net merit; STA, stature; STR, strength; BD, body depth; RW, rump width; DF, dairy form; RA, rump angle; FUA, fore udder attachment; RUH, rear udder height; UD, udder depth; UC, udder cleft; FTP, front teat placement; RTP, rear teat placement; TL, teat length; FA, foot angle; RLS, rear legs (side view); RLR, rear legs (rear view); FL, feet and legs; FS, final score.

Format: XLSX Size: 640KB Download file

Open Data

Additional file 9:

Table S3. (Excel file) Output file of top 100 effects on 31 dairy traits with stratification correction based on principal component analysis (PCA) using to top 20 principal components as covariables. Chr30 is the X chromosome, and Chr32 indicates markers with unknown chromosome locations. MY, milk yield; FY, fat yield; PY, protein yield; FPC, fat percentage; PPC, protein percentage; SCS, somatic cell score; DPR, daughter pregnancy rate; PL, productive life; SCE, service-sire calving ease; DCE, daughter calving ease; SSB, service-sire stillbirth; DSB, daughter stillbirth; NM, net merit; STA, stature; STR, strength; BD, body depth; RW, rump width; DF, dairy form; RA, rump angle; FUA, fore udder attachment; RUH, rear udder height; UD, udder depth; UC, udder cleft; FTP, front teat placement; RTP, rear teat placement; TL, teat length; FA, foot angle; RLS, rear legs (side view); RLR, rear legs (rear view); FL, feet and legs; FS, final score.

Format: XLSX Size: 532KB Download file

Open Data

Additional file 10:

Table S4. (Excel file) Overlap between top 100 effects per trait for 31 dairy traits from methods for stratification correction and the top 100 effects from the analysis without stratification correction. A1_elite160: frequency of allele 1 in the elite cluster of 160 cows; A1_1494: frequency of allele 1 in the remaining 1,494 cows excluding the elite cluster; Allele 1 = A for AC, AG and AT, = C for CG and CT, = G for GT (GT not observed in our SNP data set). MY, milk yield; FY, fat yield; PY, protein yield; FPC, fat percentage; PPC, protein percentage; SCS, somatic cell score; DPR, daughter pregnancy rate; PL, productive life; SCE, service-sire calving ease; DCE, daughter calving ease; SSB, service-sire stillbirth; DSB, daughter stillbirth; NM, net merit; STA, stature; STR, strength; BD, body depth; RW, rump width; DF, dairy form; RA, rump angle; FUA, fore udder attachment; RUH, rear udder height; UD, udder depth; UC, udder cleft; FTP, front teat placement; RTP, rear teat placement; TL, teat length; FA, foot angle; RLS, rear legs (side view); RLR, rear legs (rear view); FL, feet and legs; FS, final score. E: the effect from the method without stratification correction [6] was among the top 100 effects from EMMAX-IBS; G: the effect from the method without stratification correction [6] was among the top 100 effects from the GLS method; E: the effect from the method without stratification correction [6] was among the top 100 effects from PCA methods; EG: the effect from the method without stratification correction [6] was among the top 100 effects from EMMAX-IBS and GLS; EP: the effect from the method without stratification correction [6] was among the top 100 effects from EMMAX-IBS and PCA; GP: the effect from the method without stratification correction [6] was among the top 100 effects from GLS and PCA; EGP: the effect from the method without stratification correction [6] was among the top 100 effects from EMMAX-IBS, GLS and PCA. ‘0’ indicates this top 100 effect was not detected by EMMAX-IBS, GLS or PCA.

Format: XLSX Size: 173KB Download file

Open Data

Additional file 11:

Figure S7. Manhattan plots of the AIPL effect distribution, and results from three sets of analysis: 1) LS, GLS, EMMAX-IBS using the full data set of 1,654 cows; 2) adding PCA to GLS and EMMAX-IBS using 1,654 cows; and 3) LS, GLS and EMMAX using 1,494 cows by removing the 160 elite cows. Red triangle indicates confirmation between effect size and significance test(s). Black triangle indicates confirmation of the AIPL effect by a nearby SNP marker. Black triangle indicates confirmation of the AIPL effect by a nearby SNP marker. Yellow triangle indicates confirmation between EMMAX and GLS. Green triangle indicates eliminated or reduced significance due to add PCA to GLS or EMMAX, or due to removing the 160 elite cows from the analysis. Blue triangle indicates increased significance due to add PCA to GLS or EMMAX, or due to removing the 160 elite cows from the analysis.

Format: PDF Size: 13.1MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data