Institute of Medical Biometry and Informatics, University Hospital Heidelberg, INF 305, 69120 Heidelberg, Germany
Division of Molecular Genetic Epidemiology, German Cancer Research Center, INF 520, 69120 Heidelberg, Germany
Institute of Human Genetics, University of Heidelberg, INF 366, 69120 Heidelberg, Germany
Division of Cancer Epidemiology, German Cancer Research Centre, INF 280, 69120 Heidelberg, Germany
Center for Family and Community Medicine, Karolinska Institute, 14183 Huddinge, Sweden
Abstract
The results from association studies are usually summarized by a measure of evidence of association (frequentist or Bayesian probability values) that does not directly reflect the impact of the detected signals on familial aggregation. This article investigates the possible advantage of a twodimensional representation of genetic association in order to identify polymorphisms relevant to disease: a measure of evidence of association (the Bayes factor, BF) combined with the estimated contribution to familiality (the attributable sibling relative risk,
Introduction
Susceptibility to rheumatoid arthritis (RA) is determined by both genetic and environmental factors, with an estimated sibling relative risk of 510. The
Let us assume that a polymorphism is a marker of a rarer causal variant, and the markerspecific PAF equals the PAF of the causal variant. Under this assumption, we have demonstrated that the causal variant has a higher attributable
There is much debate concerning the representation of statistical evidence in GWA studies. The frequentist
We hypothesize here that the representation of genetic association by BF together with the attributable
Methods
Derivation of attributable risks and Bayes factors
The sibling relative risk (
The BF statistic is the ratio of the probability of the observed data under the assumption that there is a true association to its probability under the null hypothesis (absence of association). A small BF provides evidence in favor of a true association. To investigate the relationship between attributable
GRR_{hom }= (
The expected distributions of genotypes were investigated by Bayesian logistic regression. We considered a general threegenotype model of association. Calculation of BFs requires assumptions about effect sizes. We assumed N(0,1) priors on the mean and the two genetic effects. The function
Investigated regions
This study investigated regions around the 12 signals detected in the WTCCC study with associated probability values of less than 10^{5 }(Table
Description and results from the 12 investigated regions
WTCCC study^{a}
NARAC data^{b}
Chr
SNP
Position
MAD of log_{10}
Outlying SNPs, gene regions
log_{10 }(BF)
RAF
λ_{s }Median (5^{th}95^{th})
GRR_{hom}
GRR_{het}
1p13
rs6679677
11,401,850
0.282
0.199
rs2476601,
3.91
0.084
1.048 (1.0171.098)
1p36
rs6684865
2,578,391
0.282
0.199




1p31
rs11162922
80,284,079
0.282
0.199




4p15
rs3816587
25,093,513
0.255
0.188
rs12505556,
3.04
0.112
1.071 (1.0111.630)
6p21a
rs6457617
32,771,829
0.306
0.206
rs2395175,
16.48
0.148
1.164 (1.1081.241)
rs7765379,
6.11
0.113
1.175 (1.0531.668)
6p21b
rs615672
32,682,149
0.306
0.206
rs2395175,
16.48
0.148
1.164 (1.1081.241)
rs7765379
6.11
0.113
1.175 (1.0531.668)
6q23
rs6920220
138,048,197
0.306
0.206




7q32
rs11761231
130,827,294
0.268
0.185




10p15
rs2104286
6,139,051
0.290
0.197




13q12
rs9550642
19,848,092
0.286
0.200
rs1407961,
3.59
0.920
1.033 (1.0331.225)
21q22
rs2837960
41,433,788
0.288
0.208
rs468646,
4.28
0.488
1.038 (1.0191.057)
rs466092,
3.59
0.488
1.030 (1.0161.048)
22q13
rs743777
35,876,107
0.283
0.216
rs3218258,
9.63
0.771
1.061 (1.0411.084)
rs710183,
4.26
0.072
1.028 (1.0111.183)
rs8137446,
3.48
0.903
1.035 (1.0181.123)
^{a}Results from the WTCCC study.
^{b}Results based on NARAC.
Calculation of attributable risks and Bayes factors for NARAC data
SNPs in the 12 selected intervals were extracted from NARAC data, and the association between RA and the retrieved SNPs was represented by 1) the base 10 logarithm of the BF (log_{10}BF) and 2) the attributable
As already mentioned, calculation of BFs requires assumptions on effect sizes. We calculated frequentist estimates of logistic regression coefficients over the corresponding entire chromosomes using NARAC data. Density plots and tests of goodness of fit of frequentist estimates of log_{10}GRR_{Hom }and log_{10}GRR_{Het }for the seven investigated chromosomes indicated that GRR variation was better represented by the median absolute deviation (MAD) than by the standard deviation (data not shown). Therefore, we assumed N(0, MAD^{2}) priors on the logarithms of genetic effects.
A bagplot is a bivariate generalization of the nonparametric univariate boxplot
Results
Theoretical relationship between Bayes factor and attributable risk
Table
Allele frequencies at maximum
Dominant model
Recessive model
Additive model
GRR_{hom}
Max log_{10}(BF)
Max
Max log_{10}(BF)
Max
Max log_{10}(BF)
Max
1.5
0.24
0.21
0.70
0.66
0.39
0.40
2
0.22
0.17
0.64
0.61
0.38
0.33
3
0.20
0.13
0.61
0.54
0.33
0.25
4
0.17
0.10
0.59
0.48
0.26
0.20
5
0.16
0.08
0.57
0.45
0.24
0.17
10
0.14
0.05
0.49
0.33
0.17
0.09
Bayes factors and attributable sibling relative risks in the investigated regions
The analysis of complete chromosomes by frequentist logistic regression resulted in 2(GRR_{hom}, GRR_{het}) × 8(chromosomes 1, 4, 6, 7, 10, 13, 21, 22) sets of genetic effects. Table
Figure
Scatterplots of log_{10 }values of Bayes factors (log10(BF)) and attributable sibling relative risks (
Scatterplots of log_{10 }values of Bayes factors (log10(BF)) and attributable sibling relative risks (
Discussion
This study explored the advantage of combining BF, a measure of statistical evidence of association, and the attributable
The representation of
The present study made use of Bayesian and robust statistics. The Bayesian approach has been overlooked in the analysis of GWA studies, where the adjustment for multiple testing, the relationship between power and statistical significance, and the selection of disease models are important issues
Conclusion
The association results from GWA studies are usually summarized by a measure of evidence of association (frequentist or Bayesian probability values), which do not reflect the contribution of the identified signals to familial aggregation. We propose here a twodimensional characterization of genetic association consisting of the attributable
List of abbreviations used
BF: Bayes factor; GRR: Genotype relative risk; GWA: Genomewide association;
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
JLB designed the study, performed the statistical analyses and drafted the manuscript. CF and KH contributed to the study design and revised critically the manuscript. AS, NC, RH, LB, and JCC made substantial contributions to interpretation of results. All authors read and approved the final manuscript.
Acknowledgements
The Genetic Analysis Workshops are supported by NIH grant R01 GM031575 from the National Institute of General Medical Sciences. JLB was supported by Deutsche Krebshilfe and the Swedish Cancer Society. RH was supported by NGFN 2 (SMPGEM, PGES30T09), funded by the BMBF.
This article has been published as part of