Log on / register
Feedback | Support | My details
Open AccessHighly AccessResearch article

Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system

Qing T Zeng1 email, Sergey Goryachev1 email, Scott Weiss2 email, Margarita Sordo2 email, Shawn N Murphy3 email and Ross Lazarus2 email

1Decision Systems Group, Brigham and Women's Hospital, Boston, MA, USA

2Channing Laboratory, Brigham and Women's Hospital, Boston, MA, USA

3Laboratory of Computer Science, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA

author email corresponding author email

BMC Medical Informatics and Decision Making 2006, 6:30doi:10.1186/1472-6947-6-30

Published: 26 July 2006

Abstract

Background

The text descriptions in electronic medical records are a rich source of information. We have developed a Health Information Text Extraction (HITEx) tool and used it to extract key findings for a research study on airways disease.

Methods

The principal diagnosis, co-morbidity and smoking status extracted by HITEx from a set of 150 discharge summaries were compared to an expert-generated gold standard.

Results

The accuracy of HITEx was 82% for principal diagnosis, 87% for co-morbidity, and 90% for smoking status extraction, when cases labeled "Insufficient Data" by the gold standard were excluded.

Conclusion

We consider the results promising, given the complexity of the discharge summaries and the extraction tasks.


© 1999-2009 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.