Abstract
Background
If intervention A bests B in one randomized trial, and B bests C in another randomized trial, can one conclude that A is better than C? The problem was motivated by the planning of a randomized trial, where A is spiralCT screening, B is xray screening, and C is no screening. On its surface, this would appear to be a straightforward application of the transitive principle of logic.
Methods
We extended the graphical approach for omitted binary variables that was originally developed to illustrate Simpson's paradox, applying it to hypothetical, but plausible scenarios involving lung cancer screening, treatment for gastric cancer, and antibiotic therapy for clinical pneumonia.
Results
Graphical illustrations of the three examples show different ways the transitive fallacy for randomized trials can arise due to changes in an unobserved or unadjusted binary variable. In the most dramatic scenario, B bests C in the first trial, A bests B in the second trial, but C bests A at the time of the second trial.
Conclusion
Even with large sample sizes, combining results from a previous randomized trial of B versus C with results from a new randomized trial of A versus B will not guarantee correct inference about A versus C. A threearm trial of A, B, and C would protect against this problem and should be considered when the sequential trials are performed in the context of changing secular trends in important omitted variables such as therapy in cancer screening trials.
Background
Consider three statements: A, B, and C. In formal logic, if A implies B, and B implies C, then A implies C. This can be illustrated by a Venn diagram in which set A lies entirely within set B and set B lies entirely within set C. This implies set A lies entirely within set C. This is an example of transitivity. Extending this logical construct to the design and interpretation of clinical trials, one might conclude that (even with large sample sizes) if in a randomized trial, intervention A is shown superior to intervention B, and in another randomized trial intervention B is shown superior to C, then A will be shown superior to C in another randomized trial.
However, statistical association is not generally transitive. Consider three random variables: A, B, C. If A is positively correlated with B, and B is positively correlated with C, then A may or may not be positively correlated with C [2].
To our knowledge no one has investigated transitivity of results from separate randomized trials. In fact, for the sake of perceived efficiency and limited resources, the principle of transitivity is sometimes assumed in clinical trial design. The possibility of a transitive fallacy surfaced in discussions about a planned randomized clinical trial to assess efficacy of low dose spiral computed tomography (CT) for lung cancer screening.
The aim of the investigation is the use of graphical methods to explore conditions when transitive inference for randomized trials does not hold. As the mathematician, John Allen Paulos writes, "It's odd that logical acuity, rather than helping one to clarify statements, often reveals hidden ambiguities within them. Instead of leading one to form more conclusions, it makes clear that fewer conclusions are justified."[3]
Methods
We extended the graphic in [1,4] for illustrating Simpson's paradox. In the Simpson's paradox graphic, the horizontal axis is the fraction of subjects with an unobserved variable and the vertical axis is the probability of a binary outcome. One diagonal line in the plot represents the effect of unobserved variable on the probability of outcome, given treatment A. A second diagonal line, which is lower and parallel to the first, represents the effect of the unobserved variable on the probability of outcome, given treatment B. The graphic shows that if the fraction with the unobserved variable differs between groups receiving A and B (as in an observational study), then subjects receiving treatment B could have a higher probability of outcome than subjects receiving A, even though the line for A is higher than the line for B.
For this investigation, we added a third treatment to the graphic and investigated three hypothetical, but plausible, scenarios, one involving lung cancer screening, one involving treatment for gastric cancer, and one involving antibiotics. Unlike the Simpson's paradox example, in all the scenarios there is an interaction between an unobserved binary variable and the intervention. For each scenario we considered two study designs: (1) separate randomized trials of B versus C and A versus B, and (2) a randomized threearm trial of A, B, and C. In the discussion below, we emphasize that the trial sizes would be large enough to eliminate considerations of simple statistical variation or imprecision in measurement of outcome variables. We also emphasize that the results would hold if there were no bias from contamination or noncompliance.
Results
We present graphical illustrations of the transitive fallacy in three hypothetical, but plausible, scenarios.
Lung cancer screening with improved therapy (Figure 1)
Let A denote spiralCT screening, B denote chest xray screening, and C denote the control group of no screening or "usual care." The endpoint is lung cancer mortality rate. A previous randomized trial of B versus C found similar lung cancer mortality rates for the two interventions [5,6]. Currently there is discussion of a new randomized trial to compare A and B. Suppose the new randomized trial shows similar cancer mortality rates for A and B. Would that constitute proof of similar cancer mortality rates for A and C?
The unobserved binary variable is an indicator of whether or not subjects received a (relatively) new (and effective) therapy after early detection. Although type of therapy is observable, it is not generally analyzed or reported in papers summarizing the results of screening trials. We suppose that the new therapy decreases the lung cancer mortality rates for A, B, and C, but at different rates (i.e. a quantitative interaction). In addition, unless there is substantial overdiagnosis, it is reasonable that the greatest decrease would occur with A due to earliest detection. Also, unless the therapy is effective at all stages of cancer, it is reasonable that the smallest decrease would occur with C due to late detection and greater total tumor burden.
We realistically assume that the percent of subjects who receive the new therapy has increased over time as randomized treatment trials establish its worth. For purposes of illustration, we specify that in the first trial of B versus C, 30 percent of the subjects receive the new therapy, and in the planned second trial (either A versus B, or A versus B versus C) 70 percent of the subjects will receive the new therapy.
Figure 1 shows a realistic set of outcomes. In the first trial the cancer mortality rate is similar for B and C. A second trial of A versus B also indicates similar cancer mortality rates (Figure 1, left). Under the transitive fallacy, one would incorrectly conclude that the cancer mortality rates for A and C are similar at the time of the second trial However, when 70% of the subjects receive the new therapy, the threearm trial correctly shows that A is substantially better than C (Figure 1, right).
Cancer treatment with improved supportive care (Figure 2)
A second hypothetical example involves treatment for gastric cancer. Let A denote, radical gastrectomy/splenectomy, B denote "simple" gastrectomy, and C denote radiation. Suppose the endpoint is percent mortality over some time period and the unobserved covariate is effective supportive care. It is plausible that supportive care improves over time with better intensive care and better antibiotics for any infections that arise.
In an earlier period when a small percentage of subjects receive effective supportive care, the more aggressive treatments carry substantially more treatmentrelated mortality. As shown in Figure 2 (left side), a randomized trial of B versus C during this earlier period demonstrates considerably higher mortality for B than C. In a later period, when a larger percentage of subjects receive effective supportive care, the mortality rates converge. As shown in Figure 2 (left side) a randomized trial of A versus B during the later period indicates similar mortality rates. Under the transitive fallacy, one would incorrectly conclude that the cancer mortality rates for A and C differ substantially at the time of the second trial. However, when a large percentage of subjects receive effective supportive care, the threearm trial correctly shows that mortality rates for A and C are similar (Figure 2, right).
Antibiotic treatment with change in percent gram positive (Figure 3)
A third hypothetical example involves the use of empiric antibiotics to treat clinical pneumonia. Suppose A is an antibiotic that treats gram positive organisms but not gram negative; B is an antibiotic that treats gram negative but not gram positive pneumonias; and C is an antibiotic that treats gram positive pneumonias better than A. The endpoint is fraction successfully treated. Suppose the percent of organisms that are gram positive is an unmeasured covariate. This is a realistic scenario given that the spectrum of bacterial infections can shift over time, or can differ from hospital to hospital at the same time.
Suppose a randomized trial of A versus B has been completed and investigators are considering a randomized trial of B versus C. (In this scenario, A versus B occurs prior to B versus C, even though it is depicted farther to the right on the graph.) We realistically assume the percent of organisms that are gram positive has decreased by the time of the second trial. For purposes of illustration suppose 80 percent of the subjects are gram positive in the first trial and only 10% are gram positive in the second trial.
In this situation A bests B in a randomized trial of patients who mainly have grampositive infections, and B bests C in a randomized trial of patients with mainly gramnegative infections. (Figure 2 left). Under the transitive fallacy, one would incorrectly conclude that when 10% of the subjects have gramnegative infections, A would best C. However, the threearm study correctly shows that when 10% of the subjects have gramnegative infections, C would best A (Figure 2 right).
Figure 1. Screening for lung cancer. Hypothetical results are shown for A = spiral CT, B = xray, and C = no screening. In the sequential study, results for B and C are similar in the first trial (when 30% of the subjects receive the new therapy), and results for A and B are similar in the second trial (when 70% of the subjects receive the new therapy). However, as shown with threearm trial, it is incorrect to make the transitive inference that when 70% of the subjects receive the new therapy, the results for A and C would be similar.
Figure 2. Treatment for gastric cancer. Hypothetical results are shown for A = radical gastrectomy / splenectomy, B = "simple" gastrectomy, and C = radiation. In the sequential study, results for B and C differ in the first trial (when 30% of the subjects receive effective supportive care), and results for A and B are similar in the second trial (when 70% of the subjects receive effective supportive care). However, as shown with threearm trial, it is incorrect to make the transitive inference that, when 70% of the subjects receive effective supportive care, the results for A and C would differ.
Figure 3. Antibiotic treatment for clinical pneumonia. Hypothetical results are shown for A = antibiotic for grampositive, B = antibiotic for gramnegative, C = antibiotic for grampositive that is more effective than A. In the sequential study, A bests B in the first trial (when 80% of the subjects are gram positive), and B bests C in the second trial (when 10% of the subjects are gram positive). However, as shown with the threearm trial, it would be incorrect to make the transitive inference that when 10% of the subjects are grampositive, A is better than C.
Conclusion
Given only a previous randomized trial of B versus C and a new randomized trial of A versus B, inference about A versus C can be misleading. In contrast a threearm randomized trial of A, B, and C, will yield appropriate inference about both A versus B and A versus C. Validity of the sequential studies strategy (B versus C, and A versus B) rests on the assumption that there is no intervening important covariate that could confound the implied principle of transitivity. Given the amount of resources that are often invested in large "definitive" clinical trials, the possibility of such covariates should be an explicit part of the discussion in designing the trials and interpreting the results.
Authors' contributions
In discussions about designing a planned randomized trial of lung cancer screening with one of the trial investigators, the authors pointed out that the concept of transitivity might not hold and that there was therefore some risk in conducting a twoarm study. After further discussion, they decided to develop this concept into a manuscript. SGB had the general idea of using the graphic from Simpson's paradox to explain lack of transitivity. BSK worked out crucial details of how it applies in practice. SGB wrote the initial draft and BSK made substantial improvements. Both authors read and approved the final manuscript.
Competing interests
None declared.
Disclaimer
This works represents the opinions of the authors and does not necessarily represent the opinions of the federal government or of the Department of Health and Human Services.
References

Baker SG, Kramer BS: Good for women, good for men, bad for people: Simpson's paradox and the importance of sexspecific analysis in observational studies.
Journal of Women's Health & GenderBased Medicine 2001, 10:867872. PubMed Abstract  Publisher Full Text

Langford E, Schwertman N, Owens M: Is the property of being positively correlated transitive?
The American Statistician 2001, 55:322325. Publisher Full Text

Paulos JA: I Think Therefore I Laugh: The Flip Side of Philosophy.

Wainer H: The BKPlot: Making Simpson's paradox clear to the masses,.

Fontana RS, Sanderson DR, Woolner LB, Taylor WF, Miller WE, Muhm JR, et al.: Screening for lung cancer: a critique of the Mayo Lung Project.
Cancer 1991, 67(4 Suppl):115564. PubMed Abstract

Marcus PM, Bergstralh EJ, Fagerstrom RM, Williams DE, Fontana R, Taylor WF, Prorok PC: Lung Cancer Mortality in the Mayo Lung Project: Impact of Extended Followup.
Journal of the National Cancer Institute 2000, 92:13081316. PubMed Abstract  Publisher Full Text
Prepublication history
The prepublication history for this paper can be accessed here: