To preserve patient anonymity, health register data may be provided as binned data only. Here we consider as example, how to estimate mean survival time after a diagnosis of metastatic colorectal cancer from Norwegian register data on time to death or censoring binned into 30 day intervals. All events occurring in the first three months (90 days) after diagnosis were removed to achieve comparability with a clinical trial. The aim of the paper is to develop and implement a simple, and yet flexible method for analyzing such interval censored and truncated data.
Considering interval censoring a missing data problem, we implement a simple multiple imputation strategy that allows flexible sensitivity analyses with respect to the shape of the censoring distribution. To allow identification of appropriate parametric models, a χ2-goodness-of-fit test--also imputation based--is derived and supplemented with diagnostic plots. Uncertainty estimates for mean survival times are obtained via a simulation strategy. The validity and statistical efficiency of the proposed method for varying interval lengths is investigated in a simulation study and compared with simpler alternatives.
Mean survival times estimated from the register data ranged from 1.2 (SE = 0.09) to 3.2 (0.31) years depending on period of diagnosis and choice of parametric model. The shape of the censoring distribution within intervals did generally not influence results, whereas the choice of parametric model did, even when different models fit the data equally well. In simulation studies both simple midpoint imputation and multiple imputation yielded nearly unbiased analyses (relative biases of -0.6% to 9.4%) and confidence intervals with near-nominal coverage probabilities (93.4% to 95.7%) for censoring intervals shorter than six months. For 12 month censoring intervals, multiple imputation provided better protection against bias, and coverage probabilities closer to nominal values than simple midpoint imputation.
Binning of event and censoring times should be considered a viable strategy for anonymizing register data on survival times, as they may be readily analyzed with methods based on multiple imputation.
Individualized register data are routinely collected in many countries on a broad variety of diseases, and are becoming an indispensable source of information for health research. However, as informed consent is not obtained from patients, preservation of anonymity is a key concern when allowing researchers access to register data. Often, access is only allowed to binned data in which individuals can no longer be identified. Such data may pose an analytic challenge to researchers since dedicated statistical procedures for this situation are not readily available in standard statistics software, and hence there is a need for general purpose strategies that can be easily implemented in these settings.
The example of binned register data which we will study in this paper arose in the context of colorectal cancer, one of the most frequent malignancies in industrialized countries. Since a substantial proportion of patients either have clinical metastases at the point of diagnosis or develop them in the course of the disease, the prognosis is poor. For Norwegian patients diagnosed in the period 1997-2001 with metastatic rectal cancer (ICD9 C19-21), the survival was limited with a 10.4% 5-year survival for males and 7.8% for females . Corresponding figures for colon cancer (ICD9 C18) were 7.4% for males and 8.6% for females, ibid. In a pivotal study by Hurwitz et al , patients with metastatic colorectal cancer were randomized to either conventional chemotherapy alone or conventional chemotherapy with addition of bevacizumab. Hurwitz et al found a significant treatment effect for adding bevacizumab (hazard ratio for mortality: 0.66, p < 0.001).
Consequently, the Norwegian Knowledge Centre for Health Care in spring 2007 was commissioned to undertake a health technology assessment of bevacizumab as guidance for the decision on introducing the drug into standard Norwegian treatment practice. As a key part of the assessment, estimates of recent and current mean survival times in this group of patients were requested, since it was believed that the prognosis had changed over the last two decades and was potentially quite different from those observed in the Hurwitz et al study. In 1991, Norwegian physicians' attitude to metastatic colorectal cancer was pessimistic, as only few patients had surgery to remove liver metastases and/or received chemotherapy--and if the latter then 5-fluorourcil only. In recent years, oncologists search more actively for metastases, which are then likely detected earlier; liver metastases are removed if technically possible and the patient is otherwise in good condition; chemotherapy is offered more frequently, and then usually oxaliplatin or irinotecan based.
To preserve anonymity of individual patients, CRN provided counts of deaths and censoring events without any patient specific information grouped into 30 day intervals (termed months in the following). Further, since the clinical trial reported by Hurwitz et al excluded patients with a prognosis of less than three months of survival (90 days) , it was argued that exclusion of all deaths and loss-to-follow-up events occurring within three months of diagnosis would improve comparability with the clinical trial data. Note, however, that in the clinical trial, deaths did occur also within the first three months, and so the truncation in the Norwegian registry not only removed patients ineligible for treatment with bevacizumab, but inadvertently also those who merely happened to have short survival times and who would otherwise have been eligible.
Although the anonymized data from CRN are interval censored, they are neither of the standard type I or type II interval censored data. Type I is known as current status data, where survival times are only known to be smaller or larger than a given point in time. In type II data, event times are observed to belong to intervals where either both limits are finite and observed, or alternatively one is finite and observed and the other is infinite--see  for a detailed discussion. Thus, the monthly counts of deaths are type II interval censored, but the monthly counts of loss-to-follow-up are not, as they do not assign a definitive (left) limit to the interval containing the event. Put differently, the CRN data includes interval censoring of censoring times (loss-to-follow-up). From another perspective, the CRN data may be viewed as an example of life table data with left truncation, but life table data are for non-standard analyses best regarded as interval censored . While it is known that Maximum Likelihood Estimation (MLE) is generally superior to other analytic strategies for interval censored data [5-7], it is not straightforward to conduct a full MLE analysis for this type of data, unless one is willing to adopt restrictive assumptions on the censoring distribution (see below). Using multiple imputation for analyzing interval censored data has been suggested by several, for example [8,9], but not for data with interval censored censorings.
The objective of the present paper is to investigate if a multiple imputation strategy can be utilized to validly fit parametric models to the CRN data. Secondly, it is to develop diagnostic procedures that can be used to assess the fit of any parametric distribution. Finally, to investigate whether mean survival times comparable to those obtained from the clinical trial data by Tappenden et al  can be reliably estimated, i.e. are reasonably robust to choice of parametric distribution.
The paper is organized as follows: First in the Methods section the CRN data are described and the model is introduced. We then describe a simulation study to assess the validity and statistical efficiency of the developed strategy, develop diagnostic procedures, and introduce a simulation strategy for estimating uncertainty of estimated mean survival times. Parameterization of distributions and details of the implemented strategies are presented at the end of the Methods section. The Results section first reports the results of the simulation study before proceeding to the results of analyzing the CRN data. Finally, results and their implications are discussed.
The Cancer Registry of Norway holds follow-up data on all patients diagnosed with malignancies in Norway in the period 1991-2005. For the present study, data was requested on all patients diagnosed with colorectal cancer (ICD9 codes C18, C19, and C20) for the first time, and where patients at the time of diagnosis had distant metastases (stage 4) and were less than 70 years at diagnosis. Patients who either died or were lost to follow-up earlier than 90 days after diagnosis were excluded, as it was assumed that the remaining patients would be more comparable to those included in the Hurwitz trial . For the included patients, CRN only provided 30 day counts of deaths and loss-to-follow-up, respectively, due to confidentiality concerns. CRN provided data separately for patients diagnosed within each of three time periods (1991-6, 1997-2001, and 2002-5) to allow period specific estimation. All patients were followed until death or Jan 1, 2006, whichever came first. The data provided by CRN are summarized in Table 1, where they have been binned in years for legibility. The full data set is available upon request from the lead author.
Table 1. Deaths and censorings among patients with colorectal cancer, Norway, 1991-2005
Ordinary model for survival times
Let Y be time to death after diagnosis with survivor distribution function SY (·; θ), density function fY (·; θ), and hazard hY (·; θ), where θ is a parameter vector. Let Z be a censoring event so that the observable time variable, X, is the minimum of Y and Z, accompanied by an indicator variable denoting which type of event was observed, i.e.:
If censoring is assumed independent, the marginal likelihood based on all n individuals for inference on θ, where the censoring contributions have been factored out, looks as follows
with x = (x1, x2, ..., xn) and δ = (δ1, δ2, ..., δn).
Model for truncated and interval censored data
In the data provided by CRN, we are not observing from the entire distribution of survival times--events are only observed if they exceed a constant M which here is 90 days--and so the likelihood given above in Equation 1 must be modified accordingly. Handling such truncation, or delayed entry as it is, corresponds to considering the conditional distribution of X given that X ≥ M. Hence the likelihood becomes
Further in the CRN data, all event times, X, are subject to interval censoring induced by a sequence of time points, t1, t2, ..., tm+1. This means that for each subject only the interval, [tj; tj+1), and the type of event, δ, is observed. Let n1j and n0j be the counts of Xi's falling within a given period for the two event types, i.e. n1j is the number of deaths and n0j the number of censorings between tj and tj+1, respectively. If we let gj be the conditional density of censoring events on the interval [tj ; tj+1) given it occurs in the interval, the likelihood takes the following form when not taking truncation into account
where t is the j × 2 matrix with interval end points, tj and tj+1, as rows, and n is the corresponding j × 2 matrix with rows of event counts, n0j and n1j.
If events are subject to both truncation and interval censoring the likelihood takes the form
Equations 2 and 3 implies that for censored events the likelihood involves a term for the censoring distribution, even though censoring is considered to be independent or non-informative. If gj is non-constant over [tj ; tj+1), then even under an assumption of independent censoring, its shape will be informative when evaluating these contributions as it cannot be factored out. It is thus natural to consider gj being constant over [tj ; tj+1) as a starting point for analysis, since only this model (the simplest possible) will for most parametric distribution allow direct analytical evaluation of the integrals. For example, for the Weibull survivor function direct analytical evaluation is possible only with uniform gj 's and then by use of the incomplete Γ-functions. While the time intervals [tj ; tj+1) are all rather short--making the assumption realistic--there are drawbacks to this direct approach. Besides being complex to implement numerically, it more importantly suffers from not allowing the distributional shapes of gj to be varied easily. Sensitivity analyses can thus not be readily undertaken using this approach. Instead, we suggest to consider this a missing data problem. From this point of view, a simple multiple imputation strategy lends itself naturally to be applied here: Generate for each censored individual i a censoring time (uil) that is independent and distributed according to gj on the interval wherein censoring is known to have happened--this creates a completed dataset which we label by l. This is repeated k times to result in k completed datasets. The l'th dataset can be analyzed by maximizing the following marginal likelihood, where the censoring distribution has been factored out due to the assumption of non-informativeness:
Rubin's formula can be applied to obtain a combined estimate with accompanying uncertainty measures , and sensitivity analyses can straightforwardly be conducted by generating uil's from various distributions. As distributions we considered those shown in Figure 1, with the diagonal line representing the uniform distribution. Note that the distributions presented in Figure 1 are the conditional distributions on single sub-intervals, i.e. they are not representing the global censoring distribution.
Figure 1. Censoring distribution functions for sensitivity analysis. Distribution functions used in the sensitivity analysis on the impact of assuming different shapes of the censoring distribution. Gj is the conditional distribution of Z for each of the intervals [tj ; tj+1), j = 1, ..., m, given that Z belongs to the j'th interval.
To allow comparison with simpler analytical strategies, we also present analyses based on single, midpoint imputation (replacing intervals with their midpoint) or ignoring truncation. If all censoring events are replaced by the midpoint of the interval they occur in, but event times are maintained as interval censored, we base the analysis on the following likelihood
If also the event times are replaced by the midpoint of the relevant interval, the likelihood becomes
To ignore truncation, the term SY (M)-1 is removed from the relevant likelihood.
To investigate the statistical properties of the multiple imputation strategy developed above, we set up a simulation study in which we compared the multiple imputation strategy with a simple, single imputation of interval midpoints for interval censored censoring events (Equation 5 above). We generated event times as Weibull distributed (SY (t; λ, γ) = exp(-λyγ)), where we as true values of the parameters used those estimated for the last period of the Norwegian registry data (log λ = -0.5902 and γ = 0.9425, cf. results below). Censoring was taken to be non-informative and occurring with constant rate corresponding to annual proportions of 3%, 6%, and 9%, respectively. All event times were censored at the end of follow-up, which occurred at ten years. We varied the width of censoring intervals (1, 2, 6, and 12 months, where each month is 30 days) and the sample size (500, 1,000, and 10,000). In each of the 36 settings we analyzed 2,500 generated datasets. For each setting we report the median relative bias (the difference between the median of estimates and the true value relative to the true value), the empirical coverage probability of 95% confidence intervals, and the inflation factor of standard errors. The latter is defined as the ratio between the median standard error obtained with interval censored data and the median of standard errors obtained from analysis of identical data that have not been subjected to interval censoring, but only to ordinary right censoring. Note, that the performance of the method is not evaluated under misspecification (in practical applications misspecification should be dealt with using the tools developed below) as the focus is here only on studying the ability to handle interval censoring.
To conduct valid parametric analyses it is essential to have access to diagnostic procedures that allow assessing the fit of the chosen parametric family. As the data under study are already binned, we propose a variant of the χ2 goodness-of-fit test that accommodates censoring.
The expected counts can be derived from considering the following density, evaluated at the fitted parameter values:
While fY was estimated above, we now additionally need an estimate of the censoring distribution, specifically SZ. Had Y and Z not been interval censored, it would have been straightforward to get an estimate of SZ via Kaplan-Meier with deaths considered as censoring events and censorings as failure events. We thus propose to impute Y and Z and then estimate SZ by use of the Kaplan-Meier estimator, before computing expected values for each interval by averaging over imputations. Specifically, we suggest the following algorithm for generating imputed datasets:
2. For each censoring event Zi in interval [tj; tj+1), the time of censoring is imputed from a uniform distribution over this interval.
3. From the imputed dataset, the Kaplan-Meier estimate of SZ is found by considering loss-to-follow-up as event times and deaths as censoring events.
4. From the fitted distributions for Y and Z, the density in Equation 7 is numerically integrated over each relevant time interval. The expected count for each interval is found by multiplication with the total number of observed events.
Steps 1 to 4 are repeated to create 10 imputed datasets with expected counts. The overall expected count is then found by averaging over the imputed datasets.
Finally, the χ2 goodness-of-fit test statistic was computed as follows: If the expected number of events in an interval was smaller than five, the interval was joined with its lower neighbor. This procedure was repeated until all expected values were greater than five. The goodness-of-fit test statistic was evaluated as
where and are the index and count of joined intervals, respectively, and is the observed count of events in the 'th interval. The test statistic was evaluated in the χ2 distribution with degrees of freedom, where q is the number of parameters fitted in the parametric model.
Estimation of mean survival time
For all the distributions analyzed in this paper (see Table 2), the mean survival could be computed directly from the estimated parameters, except for the Gompertz distribution. The Gompertz estimates did however correspond to infinite mean survival, and so no alternative, numerical strategy was needed in this case. To reflect the uncertainties of the estimated parameters in computed means, we used a simulation strategy since this avoids the assumption of normality of the estimated mean and the assumption of differentiability, both of which are required by the delta method. For each parameter set, we thus sampled 10,000 times from a multivariate normal distribution defined by the estimate and their estimated covariance matrix, and for each sampled value computed the corresponding mean. As overall estimate of the mean survival time we used the median of the means accompanied by an empirical 90% confidence interval, i.e. the 5% and 95% percentiles in the sampled distribution of means.
Table 2. Parameterization of distributions
The parametric distributions considered and their parameterization are shown in Table 2. All analyses were conducted using Stata 9 . More specifically, the likelihoods in Equations 1 and 4, respectively, were both coded and evaluated using the general maximum likelihood command, -ml-, available in Stata 9, cf. . The multiple imputation used the -micombine- command of the ICE add-on package to obtain joint estimates across imputations . Throughout we have chosen the number of imputations to be 10 [11, p. 114]. Examples of code used for the computations are available upon request to the authors.
Table 3 presents relative bias, coverage probabilities and inflation of standard errors when either using single midpoint imputation or multiple imputation for interval censored censoring events. When interval censoring is induced by intervals with lengths up to six months, both analytic strategies perform well with negligible bias, coverage probabilities of confidence intervals close to the nominal value, and standard errors that are only marginally increased relative to the ordinary situation, i.e. when only ordinary right censoring is present. As a general tendency, however, the multiple imputation has lower relative bias and better coverage, in particular when censoring becomes more dominant in terms of higher censoring rates and wider censoring intervals. Only in extreme cases of 6% and 9% annual censoring proportions, censoring intervals of one year length, and larger sample sizes of 10,000, does coverage probabilities of the multiple imputation strategy decrease unacceptably to levels around 75% to 80%, albeit not as low as when single midpoint imputation is used. This poor performance in extreme cases is to be expected, as the conditional censoring distribution is taken to be uniform in the multiple imputation analysis, while censoring times are actually generated from an exponential model--it is this effect that becomes apparent when datasets become large and intervals very wide, but not before. In all situations, both analytic strategies yield virtually identical statistical efficiency, which depend mostly on the width of censoring intervals, and less on the rate of censorings. The loss in statistical efficiency is twice as high for the shape parameter γ as for the scale parameter log(λ).
Table 3. Simulation results comparing analyses with multiple imputation or single midpoint imputation in terms of bias, coverage probability and relative efficiency
All the five fitted distributions (Gamma, Gompertz, Log-Logistic, Log-Normal, and Weibull) fitted the data well in the last period (2002-2005), cf. Table 4. None of the distributions achieved adequate fit for the other periods, including when all data was pooled, although the Gompertz according to p-values and diagnostic plots came close for the first period, 1991-1996 (plots not shown, available upon request). Figure 2 displays diagnostic plots for the Weibull distribution in each of the three periods separately and when pooled, which supports that the fit was adequate only in the last period, 2002-2005. In the other periods, the number of events is generally overestimated in the beginning, then underestimated from six months to approximately three years, and finally overestimated at the right tail. As the last period has a shorter follow-up period, the fit of any parametric model is only determined by the events in the earlier part of the follow-up period. Plots for the other distributions are available upon request.
Table 4. Goodness-of-fit tests and estimated mean survival times for five parametric distributions, patients with metastatic colorectal cancer, Norway 1991-2005
Figure 2. Goodness-of-fit plots. Difference between expected and observed counts of events over the time scale. χ2 is the goodness-of-fit test statistic value with d.f. degrees of freedom and associated p-value. Note that the plot for all periods pooled has a differently scaled Y-axis than the period specific plots.
The estimated means varied substantially between the distributions, both when the fit was poor (long follow-up time, 1991-1996), and when the fit was good (shorter follow-up time, 2002-2005). The means ranged from 1.89 years (Gamma) to 3.14 years for the Log-Logistic and even infinity for the Gompertz in the last period, 2002-2005 (Table 4). The likelihood ratio test for homogeneity of Weibull distributions across periods showed statistical significance with a χ2 value of 122.7 on 4 degrees of freedom yielding p ≪ 0.0001. Note, that while mean survival times varied between periods, the direction of change was not consistent from distribution to distribution: For the Weibull, Log-Normal, and the Gamma distribution mean survival time increased with period, the Gompertz found it to be infinite in all periods, whereas the Log-Logistic suggested an increase between the first two periods and then a small decline for the last. Both the Gamma and Log-Logistic distributions have asymmetric confidence intervals for the mean, something that would not easily have been detected if we had used the delta method for estimating means from the parameter estimates. Based on the trial by Hurwitz et al , the mean survival was estimated to 1.98 years in the intervention group and 1.57 years in the control group by Tappenden et al .
For the Weibull distribution we estimated parameters using different approaches for handling the interval censoring and truncation, cf. Table 5. As expected from the simulation study above, the use of single midpoint imputation for censoring events in this situation yields virtually identical estimates to those based on multiple imputation--a consequence of the short censoring intervals. Results based on imputing midpoints of intervals for both events and censoring differ slightly more, but are for all practical purposes still identical. Ignoring the truncation does however create noticeably bias in the estimated mean survival time.
Table 5. Weibull Parameter estimates and estimated mean survival times, patients with metastatic colorectal cancer, Norway 1991-2005
Table 6 gives the results of applying the four different assumptions regarding the shape of the censoring distribution within each interval of one month length when using a Weibull distribution for event times. We only give results for the last time period and for all three time periods joined together, but results for the other time periods (not shown) were similar in the sense that changes in estimates were negligible with respect to choice of censoring distribution. This is a result of the censoring intervals being short in this case (one month), but in applications with longer intervals we would suggest mimicking this sensitivity analysis, as the assumed shape of censoring distribution becomes more important with long intervals.
Table 6. Results of sensitivity analysis with respect to the assumed shape of the interval specific censoring distribution
The analyses above showed how a multiple imputation strategy combined with a relatively simple maximum likelihood estimation procedure could yield a flexible and valid parametric analysis of interval censored data, and at the same time avoid numerical complexities. Based on the estimated parameters, mean survival times and their uncertainty could be estimated, and finally a goodness-of-fit test was implemented by utilizing the inherent binning of the interval censored data together with a multiple imputation strategy. The simulation studies demonstrated that as long as intervals are not too wide, the multiple imputation is a valid analytic strategy with low loss of statistical efficiency relative to analyses of data without interval censoring.
In the simulation study we focused attention on the statistical properties of the imputation strategy when the parametric model is correctly specified. As shown in our analysis of the Norwegian register data, it is however often not simple in actual applications to identify which model is correctly specified--if such a tractable model exist at all--even with adequate statistical diagnostic procedures. While the problem is not restricted to binned data, it may become attenuated by the binning, as it may make the detection of deviations from the assumed distribution more difficult. If this aspect should have been studied in a simulation study, one might suggest to compare the results of misspecified analyses based on ordinary right censored data and binned data, respectively. We are however confident that only when intervals become wide will the problem of misspecification have the potential to become more pronounced than in the ordinary right censored situation, as for short intervals results are virtually identical between analyses based on right censored and binned data. Our simulation study shows that the imputation method should with wide intervals be used cautiously anyway, as the bias is then large even when the model is correctly specified.
While the CRN data were truncated and interval censored, it might be argued that both features were not prominent for the data: Only the first 90 days of data are discarded, and the binning in 30 day intervals is but a fine grained filtering. Even though three months is short compared to the length of follow-up--in particular for the earlier cohorts--we showed that three months is considerable with respect to estimated means, and this aspect of the data can thus not be ignored. The simulation studies revealed that the impact of interval censoring was generally less important, especially for narrow intervals. Even so, the simulation studies revealed that while single midpoint imputation yielded identical results for narrow intervals, the suggested multiple imputation strategy provided better protection against bias and confidence intervals with coverage probabilities closer to nominal values in situations with wider censoring intervals. Although both deaths and censoring events were interval censored in the CRN data, multiple imputation was only done with respect to censoring events in the estimation process. The rationale was that the distribution of deaths was predetermined by a parametric distribution, and so their likelihood contributions were given and straightforward to calculate. The censoring distribution on the other hand is not itself of interest, and so the focus here is on sensitivity of main parameter estimates to different choices of the distribution of censoring. If the interest had been in conducting semi-parametric estimation (Cox regression) of the event distribution, then multiple imputation for events (deaths) might be a simple alternative to dedicated methods for interval censored data.
Finally, it should be noted that although cost-effectiveness analyses may mandate estimation of mean survival times, this should not be considered a trivial endeavor, see for example . In most studies involving survival times, the right tail of the distribution is unobserved due to right censoring, and yet this tail is highly influential on the mean--even in situations where only a small proportion survives past the end-of-follow-up and different parametric distributions fit equally well, as documented in our analyses above. It is for this reason that the mean restricted to the observation period is commonly used in cost-effectiveness analyses, although it makes comparisons with studies with different lengths of follow-up impossible. The estimated mean from a parametric model will necessarily depend on the specific shape implicitly assumed for the unobserved part of the distribution, and so the sensitivity of the mean to distributional assumptions should be explored whenever possible, as we have done here.
Provision of binned data to maintain anonymity of patients should be considered a viable procedure, since a multiple imputation strategy can be used to account for the interval censoring created by the binning, as long as intervals are not too wide.
The authors declare that they have no competing interests.
HS and ISK jointly developed the original idea for the study. HS developed the statistical analyses and conducted them, and drafted the original and revised manuscripts. ISK provided access to data. The interpretation of data analyses was the product of discussions between the authors. ISK commented on drafts of the paper. Both authors have seen and approved the final version.
Hurwitz H, Fehrenbacher L, Novotny W, Cartwright T, Hainsworth J, Heim W, Berlin J, Baron A, Griffing S, Holmgren E, Ferrara N, Fyfe G, Rogers B, Ross R, Kabbinavar F: Bevacizumab plus irinotecan, fluorouracil, and leucovorin for metastatic colorectal cancer.
The New England journal of medicine 2004, 350:2335-42.
PMID: 15175435PubMed Abstract | Publisher Full Text
The Annals of occupational hygiene 2007, 51:611-32.
PMID: 17940277PubMed Abstract | Publisher Full Text
Nelson CL, Sun JL, Tsiatis AA, Mark DB: Empirical estimation of life expectancy from large clinical trials: Use of left-truncated, right-censored survival analysis methodology. [http://dx.doi.org/10.1002/sim.3355] webcite
Statistics in Medicine 2008., 9999