Email updates

Keep up to date with the latest news and content from BMC Medicine and BioMed Central.

Journal App

google play app store
Open Access Research article

A method for inferring medical diagnoses from patient similarities

Assaf Gottlieb1*, Gideon Y Stein23, Eytan Ruppin24, Russ B Altman1 and Roded Sharan4*

Author Affiliations

1 Departments of Bioengineering & Genetics, Stanford University, 318 Campus Drive, Stanford 94305, USA

2 Sackler School of Medicine, Tel Aviv University, Klausner St., Tel Aviv 69978, Israel

3 Department of Internal Medicine "B", Beilinson Hospital, Rabin Medical Center, 39 Jabotinski St., Petah-Tikva 49100, Israel

4 Blavatnik School of Computer Science, Tel-Aviv University, Klausner St., Tel Aviv 69978, Israel

For all author emails, please log on.

BMC Medicine 2013, 11:194  doi:10.1186/1741-7015-11-194

Published: 2 September 2013



Clinical decision support systems assist physicians in interpreting complex patient data. However, they typically operate on a per-patient basis and do not exploit the extensive latent medical knowledge in electronic health records (EHRs). The emergence of large EHR systems offers the opportunity to integrate population information actively into these tools.


Here, we assess the ability of a large corpus of electronic records to predict individual discharge diagnoses. We present a method that exploits similarities between patients along multiple dimensions to predict the eventual discharge diagnoses.


Using demographic, initial blood and electrocardiography measurements, as well as medical history of hospitalized patients from two independent hospitals, we obtained high performance in cross-validation (area under the curve >0.88) and correctly predicted at least one diagnosis among the top ten predictions for more than 84% of the patients tested. Importantly, our method provides accurate predictions (>0.86 precision in cross validation) for major disease categories, including infectious and parasitic diseases, endocrine and metabolic diseases and diseases of the circulatory systems. Our performance applies to both chronic and acute diagnoses.


Our results suggest that one can harness the wealth of population-based information embedded in electronic health records for patient-specific predictive tasks.

Patient similarity; Electronic health records; Diagnosis prediction