Open Access Open Badges Research article

An administrative data merging solution for dealing with missing data in a clinical registry: adaptation from ICD-9 to ICD-10

Danielle A Southern13, Colleen M Norris4, Hude Quan13, Fiona M Shrive13, P Diane Galbraith13, Karin Humphries56, Min Gao56, Merril L Knudtson2, William A Ghali123* and the APPROACH Investigators

Author Affiliations

1 Department of Community Health Sciences, University of Calgary, Calgary, AB, Canada

2 Department of Medicine, University of Calgary, Calgary, AB, Canada

3 Centre for Health and Policy Studies, University of Calgary, Calgary, AB, Canada

4 Faculty of Nursing, University of Alberta, Edmonton, AB, Canada

5 Department of Medicine, University of British Columbia, Vancouver, BC, Canada

6 Provincial Health Services Authority, Vancouver, BC, Canada

For all author emails, please log on.

BMC Medical Research Methodology 2008, 8:1  doi:10.1186/1471-2288-8-1

Published: 23 January 2008



We have previously described a method for dealing with missing data in a prospective cardiac registry initiative. The method involves merging registry data to corresponding ICD-9-CM administrative data to fill in missing data 'holes'. Here, we describe the process of translating our data merging solution to ICD-10, and then validating its performance.


A multi-step translation process was undertaken to produce an ICD-10 algorithm, and merging was then implemented to produce complete datasets for 1995–2001 based on the ICD-9-CM coding algorithm, and for 2002–2005 based on the ICD-10 algorithm. We used cardiac registry data for patients undergoing cardiac catheterization in fiscal years 1995–2005. The corresponding administrative data records were coded in ICD-9-CM for 1995–2001 and in ICD-10 for 2002–2005. The resulting datasets were then evaluated for their ability to predict death at one year.


The prevalence of the individual clinical risk factors increased gradually across years. There was, however, no evidence of either an abrupt drop or rise in prevalence of any of the risk factors. The performance of the new data merging model was comparable to that of our previously reported methodology: c-statistic = 0.788 (95% CI 0.775, 0.802) for the ICD-10 model versus c-statistic = 0.784 (95% CI 0.780, 0.790) for the ICD-9-CM model. The two models also exhibited similar goodness-of-fit.


The ICD-10 implementation of our data merging method performs as well as the previously-validated ICD-9-CM method. Such methodological research is an essential prerequisite for research with administrative data now that most health systems are transitioning to ICD-10.