Division of Cancer Prevention, National Cancer Institute, Bethesda, MD, USA

Department of Anesthesiology, Johns Hopkins Medical Institutions, Baltimore, MD, USA

Office of Disease Prevention and Medical Applications of Research, National Institutes of Health, Bethesda MD, USA

Abstract

Background

Although a randomized trial represents the most rigorous method of evaluating a medical intervention, some interventions would be extremely difficult to evaluate using this study design. One alternative, an observational cohort study, can give biased results if it is not possible to adjust for all relevant risk factors.

Methods

A recently developed and less well-known alternative is the paired availability design for historical controls. The paired availability design requires at least 10 hospitals or medical centers in which there is a change in the availability of the medical intervention. The statistical analysis involves a weighted average of a simple "before" versus "after" comparison from each hospital or medical center that adjusts for the change in availability.

Results

We expanded requirements for the paired availability design to yield valid inference. (1) The hospitals or medical centers serve a stable population. (2) Other aspects of patient management remain constant over time. (3) Criteria for outcome evaluation are constant over time. (4) Patient preferences for the medical intervention are constant over time. (5) For hospitals where the intervention was available in the "before" group, a change in availability in the "after group" does not change the effect of the intervention on outcome.

Conclusion

The paired availability design has promise for evaluating medical versus surgical interventions, in which it is difficult to recruit patients to a randomized trial.

Background

In terms of avoiding bias, the most rigorous method for evaluating a medical intervention is the randomized controlled trial. However, many clinical investigators are unable to conduct a randomized trial because of excessive cost or required effort or difficulty overcoming strongly held beliefs among health care providers or patients. In these situations, a clinical investigator may consider a design and analysis based on observational data (Table

Comparison of Several Methods of Evaluating a Medical Intervention

Method

Strengths

Weaknesses

Randomized controlled trial

1.

No temporal bias

1.

Cost and effort

2.

No selection bias

2.

Recruitment

Observational cohort study

1.

No temporal bias

1.

Cost of data collection

2.

Selection bias if an

important risk factor is

omitted or not

adequately quantified

Paired availability design

1.

Lessens selection bias

1.

Assumptions in Table

One common method of inference from observational data is the cohort study with an adjustment for risk factors using, for example, regression models

Methods

An alternative and less widely known approach is the paired availability design for historical controls

The paired availability design avoids many of the biases of analyses based on traditional historical controls. With traditional historical controls, investigators compare outcome among subjects who receive a new intervention with outcome among a previous group of subjects who received the standard intervention. Selection bias often arises because subjects who receive the new intervention are typically not comparable to subjects who received the standard intervention

For both the paired availability design and randomized trials subject to noncompliance, the ideal goal is an unbiased estimate of the effect of receipt of treatment. If certain requirements hold, which we discuss, a simple adjustment gives an unbiased estimate of the effect of receipt of treatment in the paired availability design. Similarly, in certain situations involving randomized trials with noncompliance, such as switching interventions immediately after randomization, a similar adjustment also yields an unbiased estimate of the effect of receipt of treatment

Results

To assist investigators who are contemplating a paired availability design, we provide an expanded list of requirements for valid inference as well as a simpler method of data analysis than previously discussed in the literature.

Design

The paired availability design uses data collected in either a prospective or retrospective manner, or a combination of the two. Although implementing a multi-center study may initially appear burdensome, two mitigating factors lessen the burden: (1) randomization is not required and (2) investigators need not collect data on risk factors if the requirements hold. The requirements (to follow) are most likely to hold when the time period for the entire study is not too long. We recommend limiting the total study duration to not more than two years, recognizing there may be exceptions due to patient accrual rate, intervention, and outcome. If availability changes gradually, it is often sufficient to split the data halfway between the start of the "before" period and the end of the "after" period; although more sophisticated statistical techniques can be employed

The change in availability between the "before" and "after" periods can take different forms which do not affect the design or analysis. With fixed availability, the intervention is available to all subjects who arrive during a certain time of day or day of the week. With random availability, the intervention is available only if the necessary personnel or equipment is available, which occurs at random. In either case, subjects can decide whether or not to undergo the intervention.

The study design has five requirements for making appropriate inference: stable population, stable treatment, stable evaluation, stable preference, and no effect of availability on the effect of intervention (Table

Requirements for Paired Availablity Design

Requirement

Specific Criteria

Stable population

1.

Hospital serves one geographic area or

is military medical center

2.

No in- or out- migration

3.

Eligibility criteria constant over time

4.

No underlying change in prognosis

over time

Stable treatment

1.

Other patient management constant

over time

Stable evaluation

1.

Evaluation criteria constant over time

Stable preference

1.

No publicized credible reports

2.

No direct-to-consumer advertising

Effect of the intervention on outcome does

1.

Effect of intervention does not depend

not change with a change in availability

on when the intervention was given

(only applicable when some in "before"

during the course of disease

group receive intervention)

2.

No learning curve for the intervention

The first requirement, stable population, is that the composition of subjects eligible for the intervention should not change from the "before" to the "after" period in ways that would affect outcome. This requirement would be violated if subjects seek treatment because of the availability of the treatment under study. The assumption is therefore violated if hospitals advertise the availability of a new diagnostic test or medical intervention. In addition, each hospital or medical center should serve a well-defined population with little in- or out- migration. Examples include the only hospital in a geographic region or a military medical center. The presence of two or more hospitals in a region could introduce bias if the new intervention were available in only one hospital and it were not possible to exclude from the analysis patients who switched hospitals to undergo the new intervention. The stable population requirement would also be violated by changes in eligibility criteria over time. If eligibility is determined by a medical diagnosis, the method of diagnosis must not change over time. Lastly the stable population requirement would be violated if the underlying prognosis of patients changed over time. For example in a study of treatment for a viral infection which is spreading through a population, the most susceptible subjects would likely enter the trial first, which would violate the stable population requirement if they have the worse prognosis after infection.

The second requirement, stable treatment, is that the patient management unrelated to the intervention is identical in the "before" and "after" groups. Thus, in studying the effect of epidural analgesia on the probability of Caesarian section, other forms of obstetric management should be constant over time. Similarly, in studying the effect of an intense chemotherapeutic regimen for cancer on survival, the type of antibiotic should not change of over time, as new and more effective antibiotics could lower treatment-related mortality irrespective of the efficacy of the anticancer regimen.

The third requirement, stable evaluation, is that the method of evaluation is identical in the "before" and "after" groups. For example, the use of a new radiologic test to stage cancer in the "after" group may artifactually improve prognosis of each stage, independent of the therapy

Because the paired availability design involves multiple hospitals or medical centers, random violations of the stable population, treatment, and evaluation requirements will tend to average out, and not affect the conclusion. The main concern is with systematic violations. To minimize systematic violations, if possible, a wide variety of hospitals or medical centers should be studied.

The fourth requirement, stable preference, is strengthened in the absence of new information in the "after" period that would change a subject's preference for the medical intervention. This requirement could be violated by a widely publicized report of a harmful side effect of the new treatment, or direct-to-consumer advertising of the intervention to consumers. To the best of our knowledge, in the paired availability design to study the effect of epidural analgesia on the probability of Cesarean section, there were no credible reports of either detrimental or beneficial side effects to the mother or fetus from epidural analgesia and no relevant direct-to-consumer advertising. In contrast, if the media reported preliminary results that radioactive seed implants had fewer side effects than previous approaches for treating prostate cancer, healthier subjects who care most about the side effects may be more likely to request the new therapy than less healthy subjects who only care if treatment reduces the risk of mortality.

The fifth requirement is that the effect of the intervention on outcome does not change with a change in availability. Importantly, it applies only when there are some subjects in the "before" group who undergo the intervention. Mathematically the following two assumptions are required to estimate method effectiveness

The fifth requirement would be violated if increased availability caused some subjects to undergo the intervention sooner in the course of the disease, changing prognosis. The fifth requirement would also be violated if there were a learning curve with new intervention, such as a surgical technique that improves with the number of procedures. If such violations are likely, the design should be restricted to hospitals or medical centers where no subjects in the "before" group received the intervention.

Baker and Lindeman provided a formula to calculate the required the number of hospitals or medical centers to achieve sufficient power for hypothesis testing

Analysis

The purpose of the analysis is to estimate the effect of the receipt of the medical intervention, which is also called method-effectiveness

D/F, where

D= difference in outcome before and after change in availability

F = fraction that received intervention

If the outcome measure is a continuous variable such as blood measure, D is a difference in the average outcomes between the "before" and "after" groups. If the outcome measure is binary, such as success or failure, D is a difference in the fraction who fail or succeed in the "before" and "after" groups.

The above estimate, D/F, has an analog in the analysis of randomized trials when some subjects switch treatments soon after randomization. With an intent-to-treat analysis, one can estimate use-effectiveness, D*, which is the effect of random assignment of treatment on outcome. Similarly, one can estimate F*, the fraction of subjects in the study group that received the new treatment minus the fraction of subjects in the control group that received the new treatment. Invoking an assumption analogous to the fifth requirement for the paired availability design, the estimated method-effectiveness is D*/F*

As illustrated in the calculations accompanying Table

Example of calculations from data in Baker and Lindeman [Reference 9]

hospital

before" group data

after group data

estimate

std error

weight

n1

e1

P1

n2

e2

p2

y

s

w

1

116

.586

.172

103

.223

.184

-.033

.143

44

2

180

.290

.080

180

.440

.090

.067

.196

24

3

373

.131

.110

421

.587

.100

-.022

.048

208

4

1000

.100

.040

1000

.450

.050

.029

.026

313

5

1298

.000

.074

1084

.480

.065

-.019

.022

333

6

1919

.000

.275

2073

.316

.229

-.146

.044

225

7

3195

.010

.030

3733

.290

.031

.004

.015

365

8

4778

.008

.194

4859

.586

.190

-.006

.014

369

9

4685

.187

.149

6170

.551

.125

-.046

.015

352

10

8108

.467

.248

9918

.678

.280

.152

.031

288

11

11159

.328

.209

11869

.499

.209

.000

.031

288

n1 (n2) = number of subjects in "before" ("after") group. el (e2) = fraction of subjects in "before" ("after") group that had epdiural analgesia, p1 (p2) = fraction of subjects in "before" ("after") group that had a Cesarean section, y= estimated effect of epidural analgesia on the probability of Cesarean section = (p2-p1)/(e2-e1), s= standard error of y= square root of (p2 (1-p2))/n2 + p1 (1-p1)/n1) /(e2-e1)^{2}, w^{*} = weight used in random effects meta-analysis. We computed the weights as follows. Let i index studies, so y_{i} and s_{i} are the values of y and s for study i. It is convenient to define w_{1} = I/ s_{i}^{2}. Following DerSimonian and Laird [Reference 19], to compute v, the variance of the true effect among the k studies, we set v equal to the larger of (Q-(k-1)) / (Σw_{i} - Σw_{i}^{2}/Σw_{i}) and 0, where Q = Σw_{i} (y_{i} - m)^{2}, m = Σy_{i} w_{i}/Σw_{i}. The random-effects weights are w^{*}_{i}= 1/(s_{i}^{2} + v), and the summary statistic is y^{*} = Σy_{i} w^{*}_{i}/Σw^{*}_{i}, with standard error s^{*} = square root of 1/Σw^{*}_{i}. Following Proschan and Follman [reference 20], the 95% confidence interval is (y^{*} - t_{k-} s^{*}, y^{*}+ t_{k-1} s^{*}), where t_{k-1} is the value of the 97 ½ percentile of a t-distribution with k-1 degrees of freedom. In this example, k = 11, Q = 50.1, v = .0025, m =-.007, s^{*} = .019 y^{*} = -.005, t_{10} = 2.23, y^{*} = -.005 and the 95% confidence interval is (-.047, .037).

Example

Baker and Lindeman applied the paired availability design to study the effect of epidural analgesia on the probability of Cesarean section

Using the aforementioned method of analysis, with more details in the notes for Table

Discussion

The paired availability design has promise for evaluating medical versus surgical interventions. For such an evaluation, it would be difficult to recruit patients to a randomized trial because few patients want to be randomized to those options. Also, many physicians feel uncomfortable assigning their patients to invasive versus non-invasive interventions. Thus a validated alternative method of evaluation would be of considerable value. We think that, in some cases, the paired availability design would be well suited for this type of evaluation. The key to the stable population requirement is having clear and constant eligibility criteria. For stable treatment, ancillary care and the method of evaluation must be the same over time. For the stable preference assumption to hold, there should be no advertising of the medical intervention. For the requirement of no effect of availability on the effect of intervention, either the surgical technique should have stabilized or the design should only include hospitals with no previous surgeries.

A possible example would be an analysis of surgical removal of liver metastases in patients with colorectal cancer. Although liver metastatectomy has been associated with favorable outcomes, a more rigorous evaluation is needed. An analysis of prospective cohort data is likely to be biased because of the difficulty observing or quantifying important risk factors such as patient performance status, tumor doubling times, and meticulous staging.

Several conditions listed in Table

A well-designed randomized study of liver metastatectomy would still give a more statistically valid assessment of the procedure than the paired availability design. However, such a randomized study has never been done, despite the use of metastatectomy for many years. A paired availability design would likely be subject to fewer biases than a cohort study comparing outcomes of patients who did versus those who did not undergo the surgical procedure.

To decide if a method for analyzing observational data is generally reliable, one should have experience comparing the results to those obtained from a randomized trial. In the only application of the paired availability design to date. Baker and Lindeman obtained similar results from the paired availability design as from a meta-analysis of randomized trials. These results differed substantially from a multivariate adjustment for concurrent controls, which likely omitted an important risk factor^{2}. Hopefully this article will spur new studies using the paired availability design, including some comparing the results to those from randomized trials.

Conclusion

We wish to emphasize that the randomized trial represents the strongest form of evaluation and should be implemented if possible. However we recognize that there are situations where the randomized trial is difficult to implement, such as comparing medical versus surgical interventions. If the requirements for the paired availability design are met, we recommend it as an alternative with advantages over the usual analyses from observational studies.

Competing Interests

None Declared

Pre-publication history

The pre-publication history for this paper can be accessed here: