In recent years there has been increased interest in evaluating breast cancer screening using data from before-and-after studies in multiple geographic regions. One approach, not previously mentioned, is the paired availability design. The paired availability design was developed to evaluate the effect of medical interventions by comparing changes in outcomes before and after a change in the availability of an intervention in various locations. A simple potential outcomes model yields estimates of efficacy, the effect of receiving the intervention, as opposed to effectiveness, the effect of changing the availability of the intervention. By combining estimates of efficacy rather than effectiveness, the paired availability design avoids confounding due to different fractions of subjects receiving the interventions at different locations. The original formulation involved short-term outcomes; the challenge here is accommodating long-term outcomes.
The outcome is incident breast cancer deaths in a time period, which are breast cancer deaths that were diagnosed in the same time period. We considered the plausibility of the basic five assumptions of the paired availability design and propose a novel analysis to accommodate likely violations of the assumption of stable screening effects.
We applied the paired availability design to data on breast cancer screening from six counties in Sweden. The estimated yearly change in incident breast cancer deaths per 100,000 persons ages 40–69 (in most counties) due to receipt of screening (among the relevant type of subject in the potential outcomes model) was -9 with 95% confidence interval (-14, -4) or (-14, -5), depending on the sensitivity analysis.
In a realistic application, the extended paired availability design yielded reasonably precise confidence intervals for the effect of receiving screening on the rate of incident breast cancer death. Although the assumption of stable preferences may be questionable, its impact will be small if there is little screening in the first time period. However, estimates may be substantially confounded by improvements in systemic therapy over time. Therefore the results should be interpreted with care.
The paired availability design is a study design and method of analysis that reduces selection bias when using data from historical controls [1-4]. With standard historical controls, one compares (i) outcomes in subjects in the current time period who received treatment with (ii) outcomes in subjects in an earlier time period who did not receive treatment. Because of self selection (e.g., less healthy subjects might be more likely to receive treatment than more healthy subjects), results from standard historical controls may be substantially biased. In contrast, in the paired availability design there is no self-selection bias because a comparison is made between (i) outcomes in all subjects in the current time period when the intervention is more widely available and (ii) outcomes in all subjects in the previous time period when intervention was less widely available. To account for the change in availability of intervention between the current and previous time periods, Baker and Lindeman  proposed a potential outcomes model based on the intervention subjects would have received had they entered the study in a different time period. The model makes it possible to estimate efficacy, the effect of intervention among the type of subjects who would have only received the intervention during a period of increased availability, as opposed to effectiveness, the effect of a change in availability. The model requires various assumptions, best described in , that are plausible in many situations.
Estimating efficacy, as opposed to estimating effectiveness, is important when combining estimates from different locations (hospitals or regions). If the fraction of subjects who receive intervention differs among locations, it is difficult to interpret the overall estimate of effectiveness. In contrast, the overall estimate of efficacy is not confounded by varying the fraction of subjects who receive intervention in different locations.
Heretofore the paired availability design has only been formulated for evaluating the effect of an intervention on a short-term endpoint, namely the effect of epidural analgesia on the probability of Caesarian section [1-4]. To extend the paired availability design to breast cancer screening, we need to consider the implications of long-term endpoints.
The first step in extending the paired availability design to the evaluation of breast cancer screening is to identify various geographic regions with a change in the availability of breast cancer screening from time period 0 to time period 1. To simplify this discussion, we presume that screening is more available in time period 1 than time period 0. The methodology is also applicable in the unlikely situation in which the reverse were true in some or all regions. The change in availability is a change in the fraction of the eligible population who are invited for screening. Following Duffy et al , there are three basic design requirements to which we have added a fourth.
The time periods should be sufficiently long to give screening sufficient time to maximize (or almost maximize) its impact on breast cancer mortality rates.
For each geographic region, time periods 0 and 1 should be the same length.
The outcome in each time period is incident breast cancer deaths, namely deaths from breast cancer in the specified time period arising from diagnosis of breast cancer during the same time period.
We consider only situations in which most screening occurs at regular intervals of the same length during each time period.
Requirement 1 can be relaxed in the special case when the two time periods are separated by a time interval. In that case one need only require that non-overlapping observation times from the start of each time period past the end of each time period be sufficiently long to maximize the impact on breast cancer screening. However, as with randomized trials, if follow-up after the last screening is too long, there could be considerable dilution from breast cancers that could not have benefited from screening, and that would reduce the efficiency of the estimates .
The rationale for Requirement 2 is that one wants the breast cancer mortality rates to be the same in the two time periods if screening has no effect and if there are no time-varying changes that could confound the results.
The rationale for Requirement 3 is that by using incident breast cancer deaths (instead of all breast cancer deaths) as an outcome measure, one can avoid dilution from the breast cancer deaths that could not have benefited from screening [5,7-10]. However when using incident breast cancer deaths instead of all breast cancer deaths, there will be a preferential selection for screening evaluation of those breast cancers that cause death soon after diagnosis (as these deaths are more likely to occur in the same time period as diagnosis). With time periods substantially longer than the mean time between breast cancer diagnosis and death, this preferential effect is mitigated, as breast cancers occurring in a larger fraction of the time period (as compared to the situation with short time periods) have a greater potential to cause death a long time after diagnosis in the absence of screening and still be included in the evaluation. Nevertheless, it is worth bearing this preferential selection in mind.
The rationale for Requirement 4 is that the screening intervention must be comparable in the two time periods.
Potential outcomes model
For each before-and-after geographic region, our goal is to estimate the efficacy of breast cancer screening, which we define as the change in average yearly probability of incident breast cancer deaths due to the receipt of screening. (We later discuss combining estimates over all regions.) As proposed in the paired availability design [1-4], we use the following thought experiment to set the groundwork for estimating efficacy. Under this thought experiment, there are four types of subjects:
A, always-receivers, who would receive screening in either time period,
C, consistent-receivers, who would not receive screening in the time period with less availability and would receive it in the time period with greater availability,
I, inconsistent-receivers, who would receive screening in the time period with less availability and would not receive it in the time period with greater availability,
N, never-receivers, who would not receive screening in either time period.
For the sake of simplicity, we assume two conditions: (1) all-or-none behavior (i.e. an individual either receives all screens at the recommended interval or none, but does not switch back and forth), and (2) there is a single dominant screening test rather than a choice among screening tests of varying efficacy. In our application, there was only one screening modality.
Let πiAz, πiCz, πiIz, and πiNz denote the probabilities of subject types A, C, I, N, respectively, in region i and time period z. Let βiAz, βiCz, βiIz and βiNz denote the probability of incident breast cancer death in time period z and region i, for subject types A, C, I, and N, respectively. The probability of incident breast cancer death in each time period is a mixture of the probabilities over all subject types in each time period,
θi0 = πiN0 βiN0 + πiC0 βiC0 + πiI0 βiI0 + πiA0 βiA0, for time period 0,
θi1 = πiN1 βiN1 + πiC1 βiC1 + πiI1 βiI1 + πiA1 βiA1, for time period 1. (1)
As with the standard paired availability design, to ensure identifiably we restrict the estimation of efficacy to type C subjects. Let Ti denote the length of follow-up for time periods 0 and 1 for region i. We define the efficacy (for type C subjects) in region i as
The probability in (2) differs from a naive comparison of the effect of screening between subjects who receive screening in time period 1 and subjects who do not receive screening in time period 1. Instead Δi is the effect of receiving screening among type C subjects. Related potential outcome models were independently formulated for randomized trials with all-or-none compliance [11,12].
In order to estimate (2) we require the following assumptions adapted from the standard paired availability design .
Assumption 1. (Stable population)
The characteristics of the population that affect the probability of incident breast cancer death are constant over time.
Assumption 2. (Stable treatment)
The screening modality and therapy following diagnosis do not change over time.
Assumption 3. (Stable evaluation)
The outcome measure, which is incident cancer breast deaths, does not change in definition over time.
Assumption 4. (Stable preferences)
Factors affecting the decision to receive screening do not change over time.
Assumption 5. (Stable screening effects)
The effect of screening on the probability of incident breast cancer death rates does not change over time.
Assumption 1 is plausible if there is little immigration or out-migration related to screening. However, substantial immigration or out-migration of subpopulations with different underlying health or cancer risk can affect results over long time periods.
Assumption 2 is problematic in evaluating some screening modalities because the reduction in incident breast cancer deaths could result from better systemic therapies, such as chemotherapy and hormonal therapy . It is possible that these changes in therapy could explain the decrease in cancer mortality rates over time, even if screening has no benefit. Therefore it is particularly important to consider the plausibility of Assumption 2.
Assumption 3 is plausible, absent major changes in death code systems.
The basic idea of Assumption 4 is that the screening intervention (including any campaigns to increase public awareness) should be the same in both time periods. If Assumptions 1–4 hold, the probability of each subject type does not change over time period. In other words πiAz = πiA, πiCz = πiC, πiIz = πiI, and πiNz = πiN. In addition, by virtue of Assumption 4, there are basically no inconsistent receivers, i.e. πiI = 0. If there is no screening in time period 0, so πiaz = πiCz = 0, and Assumption 4 only requires πiCz = πiC and πiNz = πiN, which is very plausible especially if one views public awareness as part of the screening intervention.
Assumption 5 likely holds for type N subjects because the same prior history of no screening applies to both time periods 0 and 1. Thus we can reasonably assume that the probability of incident breast cancer death among type N subjects does not depend on time period, i.e., βiN0 = βiN1 ≡ βN. However, unless there is no screening in time period 0, Assumption 5 will not hold for type A subjects. The reason is that (i) screening is generally more available before time period 1 than before time period 0, and (ii) prior screening may affect the probability of incident cancer death if screening confers benefit.
As a consequence of the above assumptions (and not applying Assumption 5 to type A subjects), we can write (1) as
θi0 = πiN βiN + πiC βiC0 + πiA βiA0
θi1 = πiN βiN + πiC βiC1 + πiA βiA1. (3)
Because θi1 - θi0 = πiC (βiC1 - βiC0) + πiA(βiA1 - βiA0), we obtain from (3)
If Assumption 5 held for type A subjects, as in the usual paired availability design, βiA1 = βiA0 , and we would obtain the standard formula, averaged over the duration of the time period, for efficacy in the paired availability design, Δi = (θi1 - θi0)/(πiCTi). We would also obtain the standard formula if there were no screening in time period 0 (and thus no type A subjects).
In order to estimate (4) we need to estimate θiz, πiC, πiA, and βiAz. Following the standard paired availability design we can estimate the first three parameters as follows. Let s = 1 if screening was received during the time period and 0 otherwise. Following Requirement 4, we assume most screening occurs at regular intervals during the time period. Let y = 1 if incident breast cancer death, and 0 otherwise.
In the ideal scenario (Scenario I) the investigators would report data nizsy, which is the number of subjects in region i and time period z with indicator of receipt of screening s and outcome y. In the typical scenario (Scenario II), the only data in published reports are the numbers who received or did not receive screening nizs+ and the numbers with a given outcome y but unknown screening status niz+y, where "+" denotes summation over the indicated subscript. For both scenarios, we obtain the following estimates,
iz = niz+1/niz++ = fraction of subjects in time period z with incident cancer death (6)
iA = ni01+/ni0++ = fraction who received screening in time period 0, (7)
iC = ni11+/ni1++ - ni01+/ni0++
= fraction who received screening in time period 1 (a combination of types A and C) minus fraction who received screening in time period 0 (type A). (8)
If we had the full data nizsy, we could estimate βiA0. However because subjects in time period 1 who receive screening are a combination of types C and A, we cannot uniquely estimate βiA1. We discuss how to circumvent this difficulty in the two scenarios.
Scenario I: Full reporting of data
When there are full reporting of data, we can estimate θizs = pr(Y = 1|i, z, s) by izs = nizs1/nizs+. Under the potential outcomes model, we write
θi00 = πiN βiN + πic βiC0, θi01 = πiA βiA0,
θi10 = πiN βiN, θi11 = πiC βiC1 + πiA βiA1. (9)
We introduce an exogenous parameter h, which is the relative risk for incident breast cancer death among type A subjects in time period 1 versus time period 0, namely,
βiA1 = h βiA0. (10)
We discuss specification of h in the section below on lead-time adjustment. Combining (9), (10) and (2) gives
The asymptotic variance is approximately
Scenario II. Limited reporting of data
With limited reporting of data we introduce a second exogenous parameter k to essentially create the same estimates as with the full reporting of data. In particular we write
θi01 = k θi0, (13)
where k = pr(S = 1 | Y = 1, Z = 0) = fraction of incident cancer deaths in time period 1 that are attributed to screening. If the full data were available, we would have an estimate of k and the methodology would be equivalent to that for Scenario I. In the absence of reported data, we propose a sensitivity analysis for k. A lower bound on k is 0 and an upper bound, assuming screening does not cause cancer deaths, is an estimate of the fraction screened in time period zero, namely pr(S = 1|Z = 0). Substituting (13) into (11) gives
where h is the same as in (10). We approximate the asymptotic variance by
Using actual reported data from the limited data scenario, we checked the approximate variances in (12) and (15) by making reasonable assumptions to impute nizsy and then also computed the asymptotic variance using the delta method. The agreement was excellent: using the data in the example and assuming relative risk of incident cancer death of .7 for screened versus not screened, the approximate and exact asymptotic variance agreed to three significant digits.
Lead time adjustment related to prior screening
We specify a value for h in (10) by using the following argument based on lead time, which is the time from screen detection to clinical detection in the absence of screening. Duffy et al  discussed a related lead time adjustment. The type A incident cancer deaths in a time period are composed of two subgroups: (i) subjects who would have been screen-diagnosed in the previous time period if there were screening in the previous time period and (ii) subjects who would not have been screen diagnosed in the previous time period if there were screening in the previous time period. Generally there is no screening prior to time period 0, so type A incident cancer deaths in time period 0 are composed of subgroups (i) and (ii). Because there is screening in time period 0 (as we are discussing type A subjects), type A incident cancer deaths in time period 1 are composed of only subgroup (ii). The parameter h in (10) is the ratio of type A incident cancer deaths in time period 1 to type A incident cancer deaths in time period 0. We approximate h by the number of subjects in subgroup (ii) divided by the number of subjects in the combination of subgroups (i) and (ii). Let L denote the mean lead time, which is approximately 2 years for breast cancer screening . Subjects in subgroup (i) are, on average, detected on screening in the last L years of the time period. Assuming uniform detection rates, we further approximate h by Ti, the length of time of screen-detection in subgroup (ii), divided by Ti + L, the average length of time of screen detection in the combination of subgroups (i) and (ii), giving h ≈ Ti/(Ti + L). Importantly, if L is short relative to Ti there will be little bias as h would approximately equal one.
Lead time adjustment related to age-range at diagnosis
Because incident cancer cases are defined based on age at diagnosis, there is also a subtle bias  related to lead time. Suppose the age-range at diagnosis of breast cancer for incident breast cancer deaths is 40–69 in both time periods. Consider type C subjects who would be clinically diagnosed if in time period 0 and screen-detected if in time period 1. Let L denote the average lead time. Due to lead time, a type C subject clinically diagnosed with cancer in the age range (40, 40 + L - 1) in time period 0 would (on average) be screen detected in the age range (40 - L, 39) in time period 1 and would contribute to incident cancer deaths in time period 0 but not time period 1. Similarly a type C subject clinically diagnosed with cancer in the age range (70, 70 + L - 1) in time period 0 would (on average) be screen-detected with cancer in the age range (70 - L, 69) in time period 1 and would contribute to incident cancer deaths in time period 1 but not in time period 0. All other type C subjects would contribute to incident cancer deaths in both time periods. To approximately correct for the bias from a mean lead time of L years, suppose for simplicity that we divide age into three age groups 40–49, 50–59, 60–69, and assume the same number of subjects in each age group. A fraction (10 - L)/10 of type C subjects in the 40–49 age group in time period 0 would be counted toward incident cancer deaths in time period 0, but would not be counted toward incident cancer deaths in time period 1. Also a fraction (10 + L)/10 type C subjects in the 60–69 age group in time period 0 would not be counted toward incident cancer deaths in time period 0, but would be counted toward incident cancer deaths in time period 1. Consequently, for this adjustment, we multiply θi1 by b πiC + (1 - πiC), where
and Da is five-year cumulative mortality following breast cancer diagnosis at age a. With L = 2  and approximating D40 = .167, D50 = .131, and D60 = .124, based on US population data , we obtain b = .98. In our example, the effect of lead-time bias due to a specified age-range is negligible and is therefore ignored.
Combining estimates over regions
To obtain a combined estimate of efficacy over all regions, we use a simple random effects meta-analysis [4,16,17]. Let wi = 1/ var() and let σ2 = the larger of (Q - (r - 1)) / (Σi wi - Σi / Σi wi) and 0, where Q = Σi wi ( - m)2, m = Σi wi / Σiwi. The random-effects weights are , and the summary statistic is , with standard error . The 95% confidence interval is ( - tr-1 se(), + tk-1se()), where tr-1 is the value of the 97 1/2 percentile of a t-distribution with r - 1 degrees of freedom, where r is the number of regions. For an example of these calculations, see Table 1.
Table 1. Data and calculations
We applied the methodology to before- and after-data on breast cancer screening in various Swedish counties . The original data involved 7 counties. However, because one-third of the population of Dalarna county enrolled in a randomized trial, some of the assumptions might be violated for women screened in Dalarna county. The main difficulty with using data from Dalarna county is that subjects who refused screening in the randomized trial may not be comparable to subjects outside of the trial who did not obtain screening due to lack of availability, and the methodology does not allow for this difference in comparability. Therefore, in our analysis, we dropped data from Dalarna county. The age ranges were 40–74 for three counties, 40–69 for two counties, and 50–59 for one county.
The data in  were reported in terms of person years of receiving screening. Dividing person years by the length of the time period we obtained the approximate number of persons eligible for screening in each region and group, niz++. Using these data, we estimated the change in the average yearly death rate of incident breast cancer among type C subjects ages 40–69 due to receipt of breast cancer screening as -9 per 100,000 with 95% confidence interval of (-14, -4) per 100,000 for k = 0 and similarly -9 per 100,000 with 95% confidence interval of (-14, -5) per 100,000 when k equaled the fraction screened. See Table 1 and Figure 1. The estimates were similar for the two values of k because only in Vastmanland County was there substantial screening in time period 0, and that was only 14%. We caution that Assumption 2 may not hold due to improvements in available systemic therapy over the periods of interest . Therefore the results must be interpreted with caution, as they may overestimate the benefit of screening.
Figure 1. Estimated change (and 95% confidence intervals) in average yearly probability of incident cancer death due to receipt of screening per 100,000 type C subjects ages 40–69
Our methodology complements that of Duffy et al. , who obtained qualitatively similar results (i.e. a statistically significant reduction in breast cancer mortality rates in time period 1 versus time period 0) based on data from seven Swedish counties. Unlike our paper, Duffy et al. estimated relative risks instead of risk differences. Duffy et al.  fit a Poisson regression model to data from all subjects in both time periods as well as data on those screened and not screened in the current time period. To adjust for self-selection bias, Duffy et al.  fit a separate model to data from refusers and participants in a randomized trial in Dalarna County. An implicit assumption is that the self-selection adjustment based on refusers in the randomized trial would apply to women who did not receive screening outside the trial. This assumption is not used in the paired availability design, which does not use data from a randomized trial. However the paired availability design is subject to bias from changes in therapy over time, as in any analysis comparing outcome in one period versus another.
Another approach to the analysis of before- and after- cancer screening data is to regress the change in cancer mortality rate over time for each region on the change in screening rates over time in each region. Sometimes the change in cancer incidence rates is used as a proxy for the change in screening rates . This approach is an attempt to extract more information from the data, because larger changes in cancer screening rates should ideally correspond to larger changes in cancer mortality rates. However this type of regression based on population-level data can give different results than a regression based on individual-level data, a phenomenon known as the ecologic fallacy .
With additional data, it may be possible to adjust for the effect of changes in therapy over time, if an additional assumption is reasonable. Suppose we had additional data on incident cancer deaths in time periods 0 and 1 in regions in which there was no screening in either time period. If (i) the therapies in these regions are representative of the therapies in the regions with screening and (ii) the population characteristics are similar to the population characteristics in regions with screening, one could reasonably estimate the effect of changes in therapy on the probability of incident cancer death.
Although data were reported on person-years of eligibility for screening, we did not use a survival analysis. A survival analysis can be incorporated into the potential outcomes model for all-or-none compliance . However such an analysis requires more data than were reported. Also, in this framework, even a constant hazard model would involve a complicated likelihood calculation.
Besides this methodology based on the paired availability design, one could also analyze observational screening data using the method of periodic screening evaluation . However this method requires regular screenings, data on the number of cancers detected on screening and in the intervals between screenings, and follow-up of subjects detected with cancer. The definitive method to evaluate cancer screening is a randomized trial . Observational approaches have a role because such trials are very expensive and difficult to implement. Thus this extension of the paired availability design to the evaluation of cancer screening could play an important role in cancer screening evaluation, but only if there were no change in therapy over time or if one could adequately adjust for any effect of a change in therapy over time.
The paired availability design can be extended to the evaluation of breast cancer screening by using incident breast cancer deaths as the outcome and requiring sufficiently long equal-length time periods before and after a change in availability of periodic screening. However the assumptions should be examined carefully. The assumption of stable preferences may be violated by a campaign to encourage screening, although the impact would be greatly mitigated if there were little screening in time period 0. Also the assumptions regarding changes in therapy over time may also be violated.
SGB wrote the initial draft. BSK and PCP made substantial improvements to the manuscript.
We thank Ping Hu for helpful comments.
Statistics in Medicine 1994, 13:2269-2278. PubMed Abstract
Duffy SW, Tabar L, Chen HH, Holmqvist M, Yen MF, Abdsalah S, Epstein B, Frodis E, Ljungberg E, Hedborg-Melander C, Sundbom A, Tholin M, Wiege M, Akerlund A, Wu HM, Tung TS, Chiu YH, Chiu CP, Huang CC, Smith RA, Rosen M, Stenbeck M, Holmberg L: The impact of organized mammography service screening on breast carcinoma mortality in seven Swedish counties.
Statistics in Medicine 1987, 16:1017-1029. Publisher Full Text
Journal of Clinical Oncology 2003, 6:963-964. Publisher Full Text
American Journal of Epidemiology 1983, 118:865-86. PubMed Abstract
American Journal of Epidemiology 1988, 127:893-904. PubMed Abstract
Baker SG, Erwin D, Kramer BS, Prorok PC: Using observational data to estimate an upper bound on the reduction in cancer mortality due to periodic screening. [http://www.biomedcentral.com/l471-2288/3/4] webcite
The pre-publication history for this paper can be accessed here: