Open Access Highly Accessed Research article

Effect of sample stratification on dairy GWAS results

Li Ma15, George R Wiggans2, Shengwen Wang1, Tad S Sonstegard3, Jing Yang1, Brian A Crooker1, John B Cole2, Curtis P Van Tassell23, Thomas J Lawlor4 and Yang Da1*

Author affiliations

1 Department of Animal Science, University of Minnesota, St. Paul, Minnesota, USA

2 Animal Improvement Programs Laboratory, Agricultural Research Service, USDA, Beltsville, Maryland, USA

3 Bovine Functional Genomics Laboratory, Agricultural Research Service, USDA, Beltsville, Maryland, USA

4 Holstein Association USA, Brattleboro, Vermont, USA

5 Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA

For all author emails, please log on.

Citation and License

BMC Genomics 2012, 13:536  doi:10.1186/1471-2164-13-536

Published: 6 October 2012



Artificial insemination and genetic selection are major factors contributing to population stratification in dairy cattle. In this study, we analyzed the effect of sample stratification and the effect of stratification correction on results of a dairy genome-wide association study (GWAS). Three methods for stratification correction were used: the efficient mixed-model association expedited (EMMAX) method accounting for correlation among all individuals, a generalized least squares (GLS) method based on half-sib intraclass correlation, and a principal component analysis (PCA) approach.


Historical pedigree data revealed that the 1,654 contemporary cows in the GWAS were all related when traced through approximately 10–15 generations of ancestors. Genome and phenotype stratifications had a striking overlap with the half-sib structure. A large elite half-sib family of cows contributed to the detection of favorable alleles that had low frequencies in the general population and high frequencies in the elite cows and contributed to the detection of X chromosome effects. All three methods for stratification correction reduced the number of significant effects. EMMAX method had the most severe reduction in the number of significant effects, and the PCA method using 20 principal components and GLS had similar significance levels. Removal of the elite cows from the analysis without using stratification correction removed many effects that were also removed by the three methods for stratification correction, indicating that stratification correction could have removed some true effects due to the elite cows. SNP effects with good consensus between different methods and effect size distributions from USDA’s Holstein genomic evaluation included the DGAT1-NIBP region of BTA14 for production traits, a SNP 45kb upstream from PIGY on BTA6 and two SNPs in NIBP on BTA14 for protein percentage. However, most of these consensus effects had similar frequencies in the elite and average cows.


Genetic selection and extensive use of artificial insemination contributed to overlapped genome, pedigree and phenotype stratifications. The presence of an elite cluster of cows was related to the detection of rare favorable alleles that had high frequencies in the elite cluster and low frequencies in the remaining cows. Methods for stratification correction could have removed some true effects associated with genetic selection.