Institute of Medical Biometry and Medical Informatics, University Medical Center Freiburg, Germany

Abstract

Background

Simpson's paradox is sometimes referred to in the areas of epidemiology and clinical research. It can also be found in meta-analysis of randomized clinical trials. However, though readers are able to recalculate examples from hypothetical as well as real data, they may have problems to easily figure where it emerges from.

Method

First, two kinds of plots are proposed to illustrate the phenomenon graphically, a scatter plot and a line graph. Subsequently, these can be overlaid, resulting in a overlay plot. The plots are applied to the recent large meta-analysis of adverse effects of rosiglitazone on myocardial infarction and to an example from the literature. A large set of meta-analyses is screened for further examples.

Results

As noted earlier by others, occurrence of Simpson's paradox in the meta-analytic setting, if present, is associated with imbalance of treatment arm size. This is well illustrated by the proposed plots. The rosiglitazone meta-analysis shows an effect reversion if all trials are pooled. In a sample of 157 meta-analyses, nine showed an effect reversion after pooling, though non-significant in all cases.

Conclusion

The plots give insight on how the imbalance of trial arm size works as a confounder, thus producing Simpson's paradox. Readers can see why meta-analytic methods must be used and what is wrong with simple pooling.

Background

Simpson's paradox, also known as the ecological effect, was first described by Yule in 1903

This is not true, as was pointed out by others

Baker and Kramer proposed a plot, later called the Baker-Kramer (BK) plot, which was independently invented by others much earlier, for graphically illustrating Simpson's paradoxon

As a (less perfect) example for meta-analysis of RCTs, we use a recent systematic review of the effect of rosiglitazone on the risk of myocardial infarction and death from cardiovascular diseases

One problem of this data is the large number of trials without any events. If the outcome is measured by the risk ratio or the odds ratio, these trials are often excluded from a meta-analysis because it is argued that they do not contribute any information about the magnitude of the treatment effect

Pooled data of rosiglitazone meta-analysis (full data see ref. [16])

Events

Total

Fraction

Rosiglitazone group

86

15556

0.5528%

Control group

72

12277

0.5865%

In the next section, we first develop two kinds of plots to reveal and illustrate the mechanism of Simpson's paradox and effect reversion, using the rosiglitazone example. The third plot emerges from overlaying both plots. In the results section, we apply the plots to the data given by Hanley and Theriault

Methods and Results

Simpson's paradox for continuous variables

The first idea to give a pictorial representation of the data is very simple. It comes from a graphic that serves for demonstrating the continuous version of the effect. For example, think of a correlation study where the data are grouped by a nominal variable

Scatterplot of correlation between two continuous variables

**Scatterplot of correlation between two continuous variables X and Y, grouped by a nominal variable Z.** Different colors represent different levels of

Simpson's paradox for dichotomous variables: a scatterplot

Let

Figure

Three plots elucidating effect reversion in rosiglitazone meta-analysis: (a) Scatterplot of fraction of events against proportion of patients in the active treatment group (left panel)

**Three plots elucidating effect reversion in rosiglitazone meta-analysis: (a) Scatterplot of fraction of events against proportion of patients in the active treatment group (left panel). **(b) Line plot displaying risk differences within trials (middle panel). 0 = control group, 1 = active treatment group. (c) Overlay plot of scatterplot and line plot (right panel).

Simpson's paradox for dichotomous variables: a line plot

A second way to demonstrate this is given by Figure

In addition, three other lines are drawn. The green line joins the estimated mean event frequencies under control and under rosiglitazone, calculated within trials and averaged with equal weights for all trials. The blue line is similar, but the trials are now weighted with their precision, measured by the inverse sampling variance, calculated from a meta-analysis using the risk difference as outcome measure. Both lines increase slightly, reflecting what in average happens within trials.

The red line, however, calculated by simple collapsing all 2 × 2-tables without stratification by trial,

Simpson's paradox for dichotomous variables: the overlay plot

The right panel plot of Figure

_{Z = i}(_{Z = i}(_{Z = i}(

where the slope _{Z = i}(_{Z = i}(_{0 }of patients actually treated in trial _{0 }= _{Z = i}(

_{0 }= _{Z = i}(_{Z = i}(_{Z = i}(_{Z = i}(

which results (after straightforward simplification) in _{0 }= _{Z = i}(

Application

We apply the plots to the example of meta-analysis of case-control studies given by Hanley and Theriault (data in reference

Three plots illustrating Simpson's paradox in a meta-analysis of case-control studies: (a) Scatterplot of frequency of exposition (on a log odds scale) against proportion of cases (left panel)

**Three plots illustrating Simpson's paradox in a meta-analysis of case-control studies: (a) Scatterplot of frequency of exposition (on a log odds scale) against proportion of cases (left panel).** (b) Line plot displaying log odds ratios within studies (middle panel). 0 = control group, 1 = case group. (c) Curved overlay plot (right panel).

Discussion

The example of the rosiglitazone meta-analysis illustrates that an ecological effect can occur even if all studies are randomized clinical trials. The scatterplot, applied to this example, shows that the myocardial infarction rate is the lower, the higher the proportion of patients in the active treatment groups is. This is no effect of the treatment, but an artefact of the studies included in this meta-analysis. The large majority of treated patients in some trials is explained by the fact that the authors pooled multiple groups of patients receiving rosiglitazone, where applicable ^{2}, ^{2}

In general, even a strong correlation contrary to the within-study association does not necessarily cause an effect reversion. This happens only if the disparities of the treatment arm sizes are large enough to outbalance the treatment effect in the single trials. This can be judged by inspection of the line plot. The line plot displays the treatment effect in each single study, as the slope of each line corresponds to the treatment effect measured in this study. The slope of the green line is the (uniformly weighted) mean treatment effect, that of the blue line the weighted mean treatment effect, the latter corresponding to the result of a meta-analysis. This kind of plot is not restricted to the risk difference, as the second example shows. Rather, it is easily generalized to a plot for the risk ratio or the odds ratio or other measures of treatment effect, such as the arcsine difference

If the

Overlay plot compared to Baker-Kramer plot [3]

Element of the plot

Plot type

Overlay plot

Baker-Kramer plot

Treatment (proportion)

Binary confounder (proportion)

Outcome

Outcome

Lines

Strata (here: Trials)

Treatments

As mentioned before, looking at the scatterplot or the overlay plot alone does not suffice, because a strong association between the proportion of patients treated and the event frequency in the direction opposite to the treatment effect is not sufficient for an effect reversion. The essential information is given by the line plot or by using the whole triplet of plots.

In addition, we screened a large set meta-analyses for finding further examples of this phenomenon. This data set, consisting of 157 meta-analyses with binary endpoints and two treatment groups was kindly provided by Peter Jüni who had collected the data at the Department of Social and Preventive Medicine, University of Berne, Switzerland. We had formerly used these data for a study on publication bias

Conclusion

The rosiglitazone example illustrates that an ecological effect (Simpson's paradox) can occur even when all studies are randomized clinical trials. However, as our empirical study shows, this is not a common phenomenon. When it occurs, it is caused by strong imbalance of the proportions allocated to the active and control treatment in the trials included in the meta-analysis. The usual measures of heterogeneity on the treatment effect scale are not sensitive against this kind of heterogeneity.

In our opinion, the plots proposed here serve to clarify what is going on beyond the calculations. Taken together, they help the reader to understand what is behind Simpson's paradox if he faces it in a meta-analysis. The R code producing the plots is available from the first author on request

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

GR conceived the proposed plots and drafted the manuscript. MS contributed the curved overlay plot and added to the writing. Both authors read and approved the final manuscript.

Acknowledgements

GR is funded by Deutsche Forschungsgemeinschaft (FOR 534 Schw 821/2-2). The authors wish to thank the two referees for helpful comments on the manuscript.

Pre-publication history

The pre-publication history for this paper can be accessed here: