Abstract
Background
Simpson's paradox is sometimes referred to in the areas of epidemiology and clinical research. It can also be found in metaanalysis of randomized clinical trials. However, though readers are able to recalculate examples from hypothetical as well as real data, they may have problems to easily figure where it emerges from.
Method
First, two kinds of plots are proposed to illustrate the phenomenon graphically, a scatter plot and a line graph. Subsequently, these can be overlaid, resulting in a overlay plot. The plots are applied to the recent large metaanalysis of adverse effects of rosiglitazone on myocardial infarction and to an example from the literature. A large set of metaanalyses is screened for further examples.
Results
As noted earlier by others, occurrence of Simpson's paradox in the metaanalytic setting, if present, is associated with imbalance of treatment arm size. This is well illustrated by the proposed plots. The rosiglitazone metaanalysis shows an effect reversion if all trials are pooled. In a sample of 157 metaanalyses, nine showed an effect reversion after pooling, though nonsignificant in all cases.
Conclusion
The plots give insight on how the imbalance of trial arm size works as a confounder, thus producing Simpson's paradox. Readers can see why metaanalytic methods must be used and what is wrong with simple pooling.
Background
Simpson's paradox, also known as the ecological effect, was first described by Yule in 1903 [1] and is named after Simpson's article in 1951 [2]. It refers to the phenomenon that sometimes an association between two dichotomous variables is similar within subgroups of a population, say females and males, but changes its sign if the individuals of the subgroups are pooled without stratification. This is reflected in the title of a paper by Baker and Kramer ('Good for women, good for men, bad for people', [3]). There are numerous examples, particularly from the areas of epidemiology and social sciences, of associations strongly affected by observed or unobserved dichotomous variables [48]. Even a tale based on Simpson's paradox has been told [9]. The reason for its occurrence is the existence of an influencing variable that is not accounted for, often unobserved. Thus, it may seem that the effect is charactistic for observational studies and can be avoided by randomization.
This is not true, as was pointed out by others [1014]. As Altman and Deeks note, Simpson's paradox is not really a paradoxon, but a form of bias, resulting from heterogeneity in the data if not accounted for [10]. Often tables of hypothetical as well as real data examples are presented. However, though these examples are easily recalculated, there is a need for readers, especially clinicians and practitioners in other fields, to really understand the nature of the phenomenon.
Baker and Kramer proposed a plot, later called the BakerKramer (BK) plot, which was independently invented by others much earlier, for graphically illustrating Simpson's paradoxon [3,1315]. Their examples stem from hypothetical data. For this plot it is required that the influencing variable is dichotomous. In the setting of a metaanalysis, however, the main source of heterogeneity and thus the most important influential variable is wellknown and not dichotomous in general: it is the variable 'trial'. A perfect example of Simpson's paradox occurring in a metaanalysis of casecontrol studies is given by Hanley and Theriault [8]. In this metaanalysis all single trials show an increased risk for exposed individuals, while the pooled analysis reverses this effect.
As a (less perfect) example for metaanalysis of RCTs, we use a recent systematic review of the effect of rosiglitazone on the risk of myocardial infarction and death from cardiovascular diseases [16]. It stated a significant increase of myocardial infarctions in the rosiglitazone group. The authors found a Peto odds ratio 1.428 with 95 per cent confidence interval [1.031; 1.979] and pvalue 0.0321 (fixed effect model) [17]. This metaanalysis immediately raised a discussion not only about the safety of the drug, but also on methodological issues referring to potential heterogeneity, different followup times, the large number of trials with no or very few events and the imbalanced group sizes within many trials [1820]. A reanalysis of the data using several variants of the MantelHaenszel method found that the significance of the effect is questionable (odds ratio estimates between 1.26 and 1.36, most of them not significant) [18]. Though not consistently significant, metaanalysis (all methods) exhibits an excess of events in the treatment group (rosiglitazone), compared to the control group (any other regimen). For example, taking the risk difference (fixed effect model, MantelHaenszel method) results in a combined estimate of 0.002 (95 per cent confidence interval [0.000; 0.004] with pvalue 0.0549), corresponding to an estimated NNH (Number Needed to Harm) of about 489 patients.
One problem of this data is the large number of trials without any events. If the outcome is measured by the risk ratio or the odds ratio, these trials are often excluded from a metaanalysis because it is argued that they do not contribute any information about the magnitude of the treatment effect [21]. In order to use all available information, simple pooling of all single tables could be rather tempting. It is seemingly convenient here because of the considerable number of doublezero studies, despite of the general consensus that this is discouraged [22]. If pooling is done – in spite of this objection – for the main endpoint myocardial infarction (MI), we in fact surprisingly observe that the pooled 2 × 2table provides the contrary: the risk of MI for the treated individuals is 0.0055 and therefore less than for the control group (0.0059), see Table 1. The pooled odds ratio is 0.94 with 95 per cent confidence interval [0.69; 1.29] (pvalue 0.7109). This (nonsignificant) effect reversion, produced by pooling, was observed by another author who in the light of these found the results of the metaanalysis 'intriguing' [23]. It can be seen as a milder form of Simpson's paradox.
Table 1. Pooled data of rosiglitazone metaanalysis (full data see ref. [16])
In the next section, we first develop two kinds of plots to reveal and illustrate the mechanism of Simpson's paradox and effect reversion, using the rosiglitazone example. The third plot emerges from overlaying both plots. In the results section, we apply the plots to the data given by Hanley and Theriault [8] and discuss both methods and results. The paper is ended with conclusions.
Methods and Results
Simpson's paradox for continuous variables
The first idea to give a pictorial representation of the data is very simple. It comes from a graphic that serves for demonstrating the continuous version of the effect. For example, think of a correlation study where the data are grouped by a nominal variable Z, say study center. The conditional correlation (i.e. the correlation, given Z) of two continuous variables X, Y is assumed to be positive for all values of Z. Simpson's paradox occurs if, on the other hand, between different levels of Z holds 'the higher X, the lower is Y '. The appropriate plot best illustrating this is given by Figure 1. It is a grouped scatterplot that shows approximately parallel ascending regression lines within each level of Z, but a decreasing sequence of midpoints. Our goal is now to transfer this idea to the case of both X and Y being dichotomous.
Figure 1. Scatterplot of correlation between two continuous variables X and Y, grouped by a nominal variable Z. Different colors represent different levels of Z.
Simpson's paradox for dichotomous variables: a scatterplot
Let X, Y be dichotomous variables, where X is the treatment (1 = active, 0 = control) and Y is the outcome of interest (e.g., 1 = MI, 0 = no MI, where MI means myocardial infarction). The grouping variable is denoted as Z. In our metaanalytic example, Z ∈ {1,..., N} is the trial (N = 42 for the rosiglitazone metaanalysis). Simpson's paradox occurs, e.g., if within (most) studies, the event Y is more frequent in the active treatment group (X = 1), but between studies, those with larger treatment proportions (corresponding to higher X) tend to exhibit lower event probabilities (corresponding to lower Y). This is possible only if the proportions of patients treated with the active drug vary substantially over all trials. Exactly this – the noticeable imbalance of the groups in many of the studies – is a characteristic feature of the rosiglitazone metaanalysis, as is pointed out both in the original article [16] and several reactions thereon, e.g. [18]. The connection between group imbalance and the occurrence of ecological effects was pointed out earlier by some authors [8,10,11].
Figure 2 (left panel) is a straightforward analogue to the continuous plot described above. Instead of the (dichotomous) variables X and Y themselves, their observed frequencies are used. A simple scatterplot is presented that shows the overall event frequencies P(Y = 1Z = i) within the N trials i = 1,..., N versus the proportions P(X = 1Z = i) of patients undergoing the active treatment. The large dispersion of the treatment proportions, unusual for randomized trials, is clearly seen. The negative correlation between treatment proportion and event probability (indicated by the fitted unweighted regression line) could lead to the deceptive impression that the frequency of adverse events decreases if more patients receive active treatment, thus potentially producing Simpson's paradox.
Figure 2. Three plots elucidating effect reversion in rosiglitazone metaanalysis: (a) Scatterplot of fraction of events against proportion of patients in the active treatment group (left panel). (b) Line plot displaying risk differences within trials (middle panel). 0 = control group, 1 = active treatment group. (c) Overlay plot of scatterplot and line plot (right panel).
Simpson's paradox for dichotomous variables: a line plot
A second way to demonstrate this is given by Figure 2 (middle panel). It shows, according to the scatterplot for continuous variables, the actual treatment X (i.e, 0 or 1) on the xaxis and the event frequencies conditional on group (X = 0 or X = 1) and trial (Z), that is P(Y = 1X, Z) on the yaxis. Points belonging to the same trial are joined by a thin line, so that different lines indicate different trials. The slope of each withintrial line corresponds to the risk difference of this trial. The lines show a tendency to increase for most trials, revealing more adverse events in the active treatment group, in agreement with the published result of the metaanalysis.
In addition, three other lines are drawn. The green line joins the estimated mean event frequencies under control and under rosiglitazone, calculated within trials and averaged with equal weights for all trials. The blue line is similar, but the trials are now weighted with their precision, measured by the inverse sampling variance, calculated from a metaanalysis using the risk difference as outcome measure. Both lines increase slightly, reflecting what in average happens within trials.
The red line, however, calculated by simple collapsing all 2 × 2tables without stratification by trial, decreases. The reason is that there are many unbalanced trials with the treatment groups being larger than the control groups and simultaneously having the lowest event rates (see Figure 2, left panel). We can visualize this by adding further elements to this plot. The starting and ending points of the single trial lines are marked by diamonds with size proportional to the size of the control group and the treatment group of this trial, respectively. If this is done, the contribution of single trial arms to the red line becomes visible. In our example, we have a large trial with many events in the control group (left) and, on the other hand, many trials with a larger proportion of rosiglitazone patients having low event rates (righthand side).
Simpson's paradox for dichotomous variables: the overlay plot
The right panel plot of Figure 2 shows a combination of the scatterplot and the line plot. The circles from the scatterplot and the trialspecific lines from the line plot are overlaid, while the regression line, the colored lines and the diamonds are skipped for sake of clarity. The interpretation of the xaxis and the lines now is slightly changed. Values x on the xaxis are interpreted as all possible proportions of active treatment in a trial. The yvalues on the line belonging to a particular trial indicate the expected frequency of events in this trial, given X = x. If X = 0, this corresponds to the observed fraction of events in the control group (the intercept). If X = 1, the value provides the observed fraction of events in the treatment group of the trial. The i'th line is thus given by the linear equation
where the slope P_{Z = i}(Y = 1X = 1)  P_{Z = i}(Y = 1X = 0) is the risk difference observed in trial Z = i, as stated above. If we insert for x the proportion x_{0 }of patients actually treated in trial i, that is x_{0 }= P_{Z = i}(X = 1), we get
which results (after straightforward simplification) in y_{0 }= P_{Z = i}(Y = 1), the overall frequency of events in trial i. These values are marked as the circles on the lines in the righthand panel, which are the same as those on the scatterplot (left panel). This equality corresponds to equation (1) in [4].
Application
We apply the plots to the example of metaanalysis of casecontrol studies given by Hanley and Theriault (data in reference [8]). The cases are children with leukemia, the exposition of interest being the presence of a high voltage power line within 100 m of the residence. Figure 3 displays the plots for this example. The yaxes are logittransformed, because effect is measured as odds ratio. The scatterplot (lefthand panel) shows that the proportion of exposed (children living near a power line, here expressed as log odds) was higher in studies with a lower casecontrol ratio. The line plot (middle panel) displays that within all studies the exposition is slightly associated with leukemia, likewise for the stratified metaanalysis (green and blue line), but in the pooled sample (red line) the direction of association is reversed. The diamonds disclose how the large case and control groups pull the red line in the opposite direction. A direct overlay of these plots would not make sense, because when using a nonlinear transformation of the yaxis, the circles of the scatterplot do not lie exactly on the lines of the line plot. Instead, by subjecting equation (1) to the logit transformation we get a curved counterpart to the overlay plot. This is shown in the righthand panel of Figure 3.
Figure 3. Three plots illustrating Simpson's paradox in a metaanalysis of casecontrol studies: (a) Scatterplot of frequency of exposition (on a log odds scale) against proportion of cases (left panel). (b) Line plot displaying log odds ratios within studies (middle panel). 0 = control group, 1 = case group. (c) Curved overlay plot (right panel).
Discussion
The example of the rosiglitazone metaanalysis illustrates that an ecological effect can occur even if all studies are randomized clinical trials. The scatterplot, applied to this example, shows that the myocardial infarction rate is the lower, the higher the proportion of patients in the active treatment groups is. This is no effect of the treatment, but an artefact of the studies included in this metaanalysis. The large majority of treated patients in some trials is explained by the fact that the authors pooled multiple groups of patients receiving rosiglitazone, where applicable [16]. On the other hand, many of these studies had only shorttime followup, so that there were only few events observed. Casually, we note that this kind of heterogeneity in study design is present although there was no indication of statistical heterogeneity of the treatment effect on any scale, as measured in terms of τ^{2}, H or I^{2}[24]. These measures do not capture heterogeneity in other respect. This taken into account, the result of the metaanalysis that more adverse events are attributed to treatment than to control, as claimed to be significant in [16], was questioned by others [18].
In general, even a strong correlation contrary to the withinstudy association does not necessarily cause an effect reversion. This happens only if the disparities of the treatment arm sizes are large enough to outbalance the treatment effect in the single trials. This can be judged by inspection of the line plot. The line plot displays the treatment effect in each single study, as the slope of each line corresponds to the treatment effect measured in this study. The slope of the green line is the (uniformly weighted) mean treatment effect, that of the blue line the weighted mean treatment effect, the latter corresponding to the result of a metaanalysis. This kind of plot is not restricted to the risk difference, as the second example shows. Rather, it is easily generalized to a plot for the risk ratio or the odds ratio or other measures of treatment effect, such as the arcsine difference [25], by using the log scale, the logit scale, or the arcsine scale for the yaxis, respectively.
If the yaxis is not transformed, the plots can be overlaid. At first glance, the overlay plot is evocative of the socalled BKplot [3,1315]. It was first demonstrated using a hypothetical situation with only two groups (males and females), with the female fraction of patients as xaxis and the two lines corresponding to the two treatments [3]. The BKplot was applied, for example, to medical school admission data [26]. There is, however, a fundamental difference between our overlay plot and the BKplot which is elucidated in Table 2. In the overlay plot, the xaxis represents the variable 'treatment', that is, proportions of patients treated with the active treatment, and the lines correspond to any number of strata (here trials). In the BKplot, however, x represents a binary confounder (i.e., proportions of patients belonging to one of two subgroups), and the lines correspond to the treatments. In fact, the BKplot was originally introduced to elucidate Simpson's paradox for the simplest case that both the treatment variable and the confounding variable are binary. The plot then contains only two lines and two circles. Insight comes from comparing the position of the two circles on these lines: If Simpson's paradox is working, the lines have the same direction, do not intersect, and the circle on the lower line lies higher than the circle on the upper line. This method of pairwise comparing circles does not work in the context of a large and maybe heterogeneous metaanalysis. The confounder, here the trial, is not binary. Moreover, Simpson's paradox in metaanalysis uses to occur in a generalized form: We do not presuppose that the effects within all studies have the same direction. An effect reversion is identified if the sign of the pooled effect differs from that of the withinstudy treatment effect, estimated using metaanalytic methods.
Table 2. Overlay plot compared to BakerKramer plot [3]
As mentioned before, looking at the scatterplot or the overlay plot alone does not suffice, because a strong association between the proportion of patients treated and the event frequency in the direction opposite to the treatment effect is not sufficient for an effect reversion. The essential information is given by the line plot or by using the whole triplet of plots.
In addition, we screened a large set metaanalyses for finding further examples of this phenomenon. This data set, consisting of 157 metaanalyses with binary endpoints and two treatment groups was kindly provided by Peter Jüni who had collected the data at the Department of Social and Preventive Medicine, University of Berne, Switzerland. We had formerly used these data for a study on publication bias [27]. For each metaanalysis, a 'Simpson check' is carried out by comparing the sign of the result of the pooled analysis to the sign of the metaanalytic result, using the risk difference (without loss of generality). We found that in 9 out of all 157 metaanalyses (5.7%) the sign changed. However, in all these examples the treatment effect was far from being significant, and the confidence intervals of the metaanalytic and the pooled estimate overlapped largely. Hence the change of the sign was of no statistical importance.
Conclusion
The rosiglitazone example illustrates that an ecological effect (Simpson's paradox) can occur even when all studies are randomized clinical trials. However, as our empirical study shows, this is not a common phenomenon. When it occurs, it is caused by strong imbalance of the proportions allocated to the active and control treatment in the trials included in the metaanalysis. The usual measures of heterogeneity on the treatment effect scale are not sensitive against this kind of heterogeneity.
In our opinion, the plots proposed here serve to clarify what is going on beyond the calculations. Taken together, they help the reader to understand what is behind Simpson's paradox if he faces it in a metaanalysis. The R code producing the plots is available from the first author on request [28].
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
GR conceived the proposed plots and drafted the manuscript. MS contributed the curved overlay plot and added to the writing. Both authors read and approved the final manuscript.
Acknowledgements
GR is funded by Deutsche Forschungsgemeinschaft (FOR 534 Schw 821/22). The authors wish to thank the two referees for helpful comments on the manuscript.
References

Yule G: Notes on the theory of association of attributes of statistics.
Biometrika 1903, 2:121134. Publisher Full Text

Simpson E: The interpretation of interaction in contingency tables.

Baker SG, Kramer BS: Good for women, good for men, bad for people: Simpson's paradox and the importance of sexspecific analysis in observational studies.
Journal of Women's Health and GenderBased Medicine 2001, 10(9):867872. PubMed Abstract  Publisher Full Text

Greenland S, Morgenstern H: Ecological bias, confounding, and effect modification.
Int J Epidemiol 1989, 18:269274. PubMed Abstract  Publisher Full Text

Julious SA, Mullee MA: Confounding and Simpson's paradox.
BMJ 1994, 309:14801481. PubMed Abstract  Publisher Full Text

Appleton D, French J, Vanderpump M: Ignoring a covariate: An example of Simpson's paradox.
The American Statistician 1996, 50(4):340341. Publisher Full Text

Reintjes R, de Boer A, van Pelt W, de Groot JM: Simpson's paradox: An example from hospital epidemiology.
Epidemiology 2000, 11:8183. PubMed Abstract  Publisher Full Text

Hanley JA, Theriault G: Simpson's paradox in metaanalysis.
Epidemiology 2000, 11(5):613. PubMed Abstract  Publisher Full Text

Significance 2007, 2:4748. Publisher Full Text

Altman DG, Deeks JJ: Metaanalysis, Simpson's paradox, and the number needed to treat.
BMC Medical Research Methodology 2002, 2:3. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Cates CJ: Simpson's paradox and calculation of number needed to treat from metaanalysis.
BMC Medical Research Methodology 2002, 2():1. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Lievre M, Cucherat M, Leizorovicz A: Pooling, metaanalysis, and the evaluation of drug safety.
Current Controlled Trials in Cardiovascular Medicine 2002, 3:6. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Baker SG, Kramer BS: The transitive fallacy for randomized trials: if A bests B and B bests C in separate trials, is A better than C?
BMC Medical Research Methodolology 2002, 2:13. BioMed Central Full Text

Baker SG, Kramer BS: Randomized trials, generalizability, and metaanalysis: graphical insights for binary outcomes.
BMC Medical Research Methodology 2003, 3:10. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Nissen SE, Wolski K: Effect of Rosiglitazone on the risk of myocardial infarction and death from cardiovascular diseases.
NEJM 2007, 356(24):24572471. PubMed Abstract  Publisher Full Text

Yusuf S, Peto R, Lewis J, Collins R, Sleight P: Beta blockade during and after myocardial infarction: An overview of the randomized trials.
Progress in Cardiovascular Diseases 1985, 27:335371. PubMed Abstract  Publisher Full Text

Diamond GA, Bax L, Kaul S: Uncertain Effects of Rosiglitazone on the Risk for Myocardial Infarction and Cardiovascular Death.
Annals of Internal Medicine 2007, 147(8):578581. PubMed Abstract  Publisher Full Text

Shuster J, Jones L, Salmon D: Fixed vs random effects metaanalysis in rare event studies: The Rosiglitazone link with myocardial infarction and cardiac death.
Statistics in Medicine 2007, 26:43754385. PubMed Abstract  Publisher Full Text

Carpenter JR, Rücker G, Schwarzer G: Letter to the Editor.
Statistics in Medicine 2007.
[DOI: 10.1002/sim.3173].
PubMed Abstract  Publisher Full Text 
Bradburn MJ, Deeks JJ, Berlin JA, Localio AR: Much ado about nothing: a comparison of the performance of metaanalytical methods with rare events.
Statistics in Medicine 2007, 26:5377. PubMed Abstract  Publisher Full Text

Whitehead A: Metaanalysis of controlled clinical trials. Wiley; 2002.

Bracken M: Rosiglitazone and cardiovascular risk.
N Engl J Med 2007, 357(9):937938.
[Author reply 939–940].
PubMed Abstract  Publisher Full Text 
Higgins JPT, Thompson SG: Quantifying heterogeneity in a metaanalysis.
Statistics in Medicine 2002, 21:15391558. PubMed Abstract  Publisher Full Text

Rücker G, Schwarzer G, Carpenter JR: Arcsine test for publication bias in metaanalyses with binary outcomes.
Statistics in Medicine 2008, 27(5):746763. PubMed Abstract  Publisher Full Text

Wainer H, Brown LM: Two Statistical Paradoxes in the Interpretation of Group Differences: Illustrated with Medical School Admission and Licensing Data.

Carpenter JR, Schwarzer G, Rücker G, Künstler R: Empirical evaluation shows the Copas selection model provides a useful summary in 80% of metaanalyses.
2007, in press.

R Development Core Team: [http://www.Rproject.org] webcite
R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria; 2006.
[ISBN 3900051070].
Prepublication history
The prepublication history for this paper can be accessed here: