This article is part of the supplement: Proceedings of the 14th European workshop on QTL mapping and marker assisted selection (QTL-MAS)
Association analyses of the MAS-QTL data set using grammar, principal components and Bayesian network methodologies
1 The Roslin Institute and R(D)SVS, University of Edinburgh, EH25 9PS, Roslin, UK
2 Tomi Silander,A*STAR Institute of High Performance Computing Fusionopolis, 1 Fusionopolis Way, 16-16 Connexis, 138632, Singapore
3 Department of Genetics, University of Santiago de Compostela, ES-27002 Lugo, Galiza, Spain
4 MRC Human Genetics Unit, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK
BMC Proceedings 2011, 5(Suppl 3):S8 doi:10.1186/1753-6561-5-S3-S8Published: 27 May 2011
It has been shown that if genetic relationships among individuals are not taken into account for genome wide association studies, this may lead to false positives. To address this problem, we used Genome-wide Rapid Association using Mixed Model and Regression and principal component stratification analyses. To account for linkage disequilibrium among the significant markers, principal components loadings obtained from top markers can be included as covariates. Estimation of Bayesian networks may also be useful to investigate linkage disequilibrium among SNPs and their relation with environmental variables.
For the quantitative trait we first estimated residuals while taking polygenic effects into account. We then used a single SNP approach to detect the most significant SNPs based on the residuals and applied principal component regression to take linkage disequilibrium among these SNPs into account. For the categorical trait we used principal component stratification methodology to account for background effects. For correction of linkage disequilibrium we used principal component logit regression. Bayesian networks were estimated to investigate relationship among SNPs.
Using the Genome-wide Rapid Association using Mixed Model and Regression and principal component stratification approach we detected around 100 significant SNPs for the quantitative trait (p<0.05 with 1000 permutations) and 109 significant (p<0.0006 with local FDR correction) SNPs for the categorical trait. With additional principal component regression we reduced the list to 16 and 50 SNPs for the quantitative and categorical trait, respectively.
GRAMMAR could efficiently incorporate the information regarding random genetic effects. Principal component stratification should be cautiously used with stringent multiple hypothesis testing correction to correct for ancestral stratification and association analyses for binary traits when there are systematic genetic effects such as half sib family structures. Bayesian networks are useful to investigate relationships among SNPs and environmental variables.