Table 3

Performance of freetext matching algorithm and MetaMap on test sets
Algorithm FMA FMA MetaMap MetaMap
Vocabulary Read/OXMIS Read/OXMIS Read/OXMIS Full Read
Test set Death General General General
Number of texts 1000 1000 1000 1000
Number of words 7534 25981 25981 25981
 Positive diagnoses detected in free text
True positives 683 346 286 273
False positives 11 32 126 18
False negatives 52 101 161 174
Precision, % 98.4 (97.2, 99.2) 91.5 (88.3, 94.1) 69.4 (64.7, 73.8) 93.8 (90.4, 96.3)
Recall, % 92.9 (90.8, 94.7) 77.4 (73.2, 81.2) 64.0 (59.3, 68.4) 61.1 (56.4, 65.6)
F-score 0.96 0.84 0.67 0.74
 Strictly defined precision for positive diagnoses (best term and correct attribute)
Number strictly correct 625 315 260 247
Precision strict, % 90.1 (87.6, 92.2) 83.3 (79.2, 86.9) 63.1 (58.2, 67.8) 84.9 (80.2, 88.8)
 Precision of non-diagnosis positive concepts
True positives 84 304 295 453
False positives 2 22 55 41
Precision, % 97.7 (91.9, 99.7) 93.3 (90.0, 95.7) 84.3 (80.0, 87.9) 91.7 (88.9, 94.0)
 Overall precision of positive concepts detected (diagnostic and non-diagnostic)
True positives 767 650 581 726
False positives 13 54 181 59
Precision, % 98.3 (97.2, 99.1) 92.3 (90.1, 94.2) 76.2 (73.1, 79.2) 92.5 (90.4, 94.2)
 Precision of negative concepts detected
True positives 5 57 0 92
False positives 5 18 0 33
Precision, % 50.0 (18.7, 81.3) 76.0 (64.7, 85.1) 73.6 (65.0, 81.1)
 Texts for which algorithm suggested a better Read term than the original term
Percentage of texts 0 1.2 0.5 0.6
 Dates and durations
True positives 116 96
False positives 15 10
False negative 25 22
Precision, % 88.5 (81.8, 93.4) 90.6 (83.3, 95.4)
Recall, % 82.3 (74.9, 88.2) 81.4 (73.1, 87.9)
F-score 0.85 0.86
 Test results and quantitative measurements
True positives 105
False positives 11
False negatives 18
Precision, % 90.5 (83.7, 95.2)
Recall, % 85.4 (77.9, 91.1)
F-score 0.89

Comparison of precision (positive predictive value) and recall (sensitivity) of the Freetext Matching Algorithm (FMA) and MetaMap against the gold standard of manual review, for two test sets: ‘General’, a random sample of 500 texts from cases and 500 from controls in a study on coronary artery disease; and ‘Death’, a random sample of 1000 texts associated with Read terms for death or suicide in 2001.

Shah et al.

Shah et al. BMC Medical Informatics and Decision Making 2012 12:88   doi:10.1186/1472-6947-12-88

Open Data