Email updates

Keep up to date with the latest news and content from BMC Medical Informatics and Decision Making and BioMed Central.

Open Access Highly Accessed Research article

Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system

Qing T Zeng1*, Sergey Goryachev1, Scott Weiss2, Margarita Sordo2, Shawn N Murphy3 and Ross Lazarus2

Author affiliations

1 Decision Systems Group, Brigham and Women's Hospital, Boston, MA, USA

2 Channing Laboratory, Brigham and Women's Hospital, Boston, MA, USA

3 Laboratory of Computer Science, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA

For all author emails, please log on.

Citation and License

BMC Medical Informatics and Decision Making 2006, 6:30  doi:10.1186/1472-6947-6-30

Published: 26 July 2006

Abstract

Background

The text descriptions in electronic medical records are a rich source of information. We have developed a Health Information Text Extraction (HITEx) tool and used it to extract key findings for a research study on airways disease.

Methods

The principal diagnosis, co-morbidity and smoking status extracted by HITEx from a set of 150 discharge summaries were compared to an expert-generated gold standard.

Results

The accuracy of HITEx was 82% for principal diagnosis, 87% for co-morbidity, and 90% for smoking status extraction, when cases labeled "Insufficient Data" by the gold standard were excluded.

Conclusion

We consider the results promising, given the complexity of the discharge summaries and the extraction tasks.