Table 2

Performance of a selection of drug-disease similarity scores.

Scoring Method

Direct Connection Validation AUC

CTD Validation AUC

PREDICT Validation AUC


Corrected drug-disease p-value

0.65

0.76

0.66

Cosine distance tf-idf

0.88

0.91

0.87

Cosine distance of p-values

0.64

0.70

0.52

Cosine distance of term fractions

0.78

0.83

0.80

Sum of the log of combined p-values

0.92

0.93

0.80

Sum of the differences of log p values

0.89

0.86

0.58

L2 of log-p of intersecting terms

0.95

0.92

0.66

L2 of term fractions of intersecting terms only

0.64

0.55

0.57

L2 of log of p-values

0.88

0.84

0.57

L2 of p-values

0.87

0.82

0.56

L2 of term fractions P(s < S)

0.85

0.90

0.78

L2 of term frequency

0.87

0.83

0.62

Total number of terms

0.90

0.87

0.62

Number of Intersecting Terms

0.91

0.91

0.63

Number of Drug Terms

0.80

0.83

0.58

Number of Disease Terms

0.84

0.83

0.60


Performance validated using novel direct drug-disease direct co-occurrences from MEDLINE, and novel drug-disease relationships from the CTD. Top scores for each validation set are presented in boldface type.

Cheung et al. BMC Medical Genomics 2013 6(Suppl 2):S3   doi:10.1186/1755-8794-6-S2-S3

Open Data