Email updates

Keep up to date with the latest news and content from BMC Medical Informatics and Decision Making and BioMed Central.

Open Access Highly Accessed Open Badges Research article

Is it possible to identify cases of coronary artery bypass graft postoperative surgical site infection accurately from claims data?

Tsung-Hsien Yu12, Yu-Chang Hou134, Kuan-Chia Lin5 and Kuo-Piao Chung12*

Author Affiliations

1 Institute of Healthcare Policy and Management, National Taiwan University, Taipei, Taiwan

2 Master Degree Program of Public Health, National Taiwan University, Taipei, Taiwan

3 Department of Chinese Medicine, Tao-Yuan General Hospital, Ministry of Health and Welfare, Taoyuan, Taiwan

4 Department of Bioscience Technology, Chuan-Yuan Christian University, Taoyuan, Taiwan

5 Department of Health Care and Management, National Taipei University of Nursing and Health Sciences, Taipei, Taiwan

For all author emails, please log on.

BMC Medical Informatics and Decision Making 2014, 14:42  doi:10.1186/1472-6947-14-42

Published: 29 May 2014



Claims data has usually been used in recent studies to identify cases of healthcare-associated infection. However, several studies have indicated that the ICD-9-CM codes might be inappropriate for identifying such cases from claims data; therefore, several researchers developed alternative identification models to correctly identify more cases from claims data. The purpose of this study was to investigate three common approaches to develop alternative models for the identification of cases of coronary artery bypass graft (CABG) surgical site infection, and to compare the performance between these models and the ICD-9-CM model.


The 2005–2008 National Health Insurance claims data and healthcare-associated infection surveillance data from two medical centers were used in this study for model development and model verification. In addition to the use of ICD-9-CM codes, this study also used classification algorithms, a multivariable regression model, and a decision tree model in the development of alternative identification models. In the classification algorithms, we defined three levels (strict, moderate, and loose) of the criteria in terms of their strictness. Sensitivity, specificity, positive predictive value, negative predictive value, and accuracy were used to evaluate the performance of each model.


The ICD-9-CM-based model showed good specificity and negative predictive value, but sensitivity and positive predictive value were poor. Performances of the other models were varied, except for negative predictive value. Among the models, the performance of the decision tree model was excellent, especially in terms of positive predictive value.


The accuracy of identification of cases of CABG surgical site infection is an important issue in claims data. Use of the decision tree model to identify such cases can improve the accuracy of patient-level outcome research. This model should be considered when performing future research using claims data.

Administrative data; Identification model; CABG; Surgical site infection; Decision tree; Classification and regression tree