Simple parametric survival analysis with anonymized register data: A cohort study with truncated and interval censored event and censoring times
1 School of Public Health, Biostatistics, Aarhus University, Bartholins Allé 2, DK-8000 Aarhus, Denmark
2 Institute of Health Management and Health Economics, University of Oslo, P.O. Box 1089, Blindern NO-0317, Oslo, Norway
BMC Research Notes 2011, 4:308 doi:10.1186/1756-0500-4-308Published: 25 August 2011
To preserve patient anonymity, health register data may be provided as binned data only. Here we consider as example, how to estimate mean survival time after a diagnosis of metastatic colorectal cancer from Norwegian register data on time to death or censoring binned into 30 day intervals. All events occurring in the first three months (90 days) after diagnosis were removed to achieve comparability with a clinical trial. The aim of the paper is to develop and implement a simple, and yet flexible method for analyzing such interval censored and truncated data.
Considering interval censoring a missing data problem, we implement a simple multiple imputation strategy that allows flexible sensitivity analyses with respect to the shape of the censoring distribution. To allow identification of appropriate parametric models, a χ2-goodness-of-fit test--also imputation based--is derived and supplemented with diagnostic plots. Uncertainty estimates for mean survival times are obtained via a simulation strategy. The validity and statistical efficiency of the proposed method for varying interval lengths is investigated in a simulation study and compared with simpler alternatives.
Mean survival times estimated from the register data ranged from 1.2 (SE = 0.09) to 3.2 (0.31) years depending on period of diagnosis and choice of parametric model. The shape of the censoring distribution within intervals did generally not influence results, whereas the choice of parametric model did, even when different models fit the data equally well. In simulation studies both simple midpoint imputation and multiple imputation yielded nearly unbiased analyses (relative biases of -0.6% to 9.4%) and confidence intervals with near-nominal coverage probabilities (93.4% to 95.7%) for censoring intervals shorter than six months. For 12 month censoring intervals, multiple imputation provided better protection against bias, and coverage probabilities closer to nominal values than simple midpoint imputation.
Binning of event and censoring times should be considered a viable strategy for anonymizing register data on survival times, as they may be readily analyzed with methods based on multiple imputation.