Email updates

Keep up to date with the latest news and content from BMC Medicine and BioMed Central.

Journal App

google play app store
Open Access Research article

A method for inferring medical diagnoses from patient similarities

Assaf Gottlieb1*, Gideon Y Stein23, Eytan Ruppin24, Russ B Altman1 and Roded Sharan4*

Author Affiliations

1 Departments of Bioengineering & Genetics, Stanford University, 318 Campus Drive, Stanford 94305, USA

2 Sackler School of Medicine, Tel Aviv University, Klausner St., Tel Aviv 69978, Israel

3 Department of Internal Medicine "B", Beilinson Hospital, Rabin Medical Center, 39 Jabotinski St., Petah-Tikva 49100, Israel

4 Blavatnik School of Computer Science, Tel-Aviv University, Klausner St., Tel Aviv 69978, Israel

For all author emails, please log on.

BMC Medicine 2013, 11:194  doi:10.1186/1741-7015-11-194

Published: 2 September 2013

Additional files

Additional file 1: Table S2:

ICD codes enriched in extreme valued blood tests.

Format: PDF Size: 236KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 2: Figure S1:

Networks of patient similarities. The similarity between patients based on medical history (A), blood test (B) and ECG (C) data.

Format: TIFF Size: 4.7MB Download file

Open Data

Additional file 3: Figure S2:

The performance of individual features in cross validation. Displayed are individual feature AUC scores for the USA data (Red) and ISR data (blue). The abbreviated feature combinations include: ICD hierarchy-based similarity (I1), ICD empirical similarity (I2), Age (A), Gender (G), blood tests- average difference (BT1), blood tests-difference between extremes (BT2), ECG tests- average difference (ECG1), ECG tests-difference between extremes (ECG2), medical history (MH1) and medical history – empirical ICD similarity based (MH2).

Format: TIFF Size: 1.8MB Download file

Open Data

Additional file 4: Figure S3:

The precision in predicting ICD codes as a function of the number of patients in the training set for the USA (A) and ISR (B) datasets.

Format: TIFF Size: 252KB Download file

Open Data

Additional file 5: Table S1:

Easy to predict ICD codes. All p-values are FDR corrected.

Format: PDF Size: 112KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 6: Figure S4:

Prediction precision for ICD level 1 categories. Precision values (blue) and relative prevalence (red) are displayed for the USA (A) and ISR (B) datasets across ICD level 1 categories: Infectious And Parasitic Diseases (A), Neoplasms (B), Endocrine, Nutritional And Metabolic Diseases, And Immunity Disorders (C), Diseases Of The Blood And Blood-Forming Organs (D), Mental Disorders (E), Diseases Of The Nervous System And Sense Organs (F), Diseases Of The Circulatory System (G), Diseases Of The Respiratory System (H), Diseases Of The Digestive System (I), Diseases Of The Genitourinary System (J), Diseases Of The Skin And Subcutaneous Tissue (K), Diseases Of The Musculoskeletal System And Connective Tissue (L), Supplementary Classification Of Factors Influencing Health Status And Contact With Health Services (M) and Classification Of Procedures (N).

Format: TIFF Size: 1.1MB Download file

Open Data