Table 3

Resources used by systems mostly based on pattern matching and/or rule-based methods.

De-identification system

Knowledge resources

Principal methods


Beckwith

Lists of proper names, locations

Regular expressions and dictionaries.


Berman

UMLS Metathesaurus, stop words

Dictionaries


Fielstein

Lists of cities and VA PHI (patient names, SSNs, MRNs...)

Regular expressions and dictionaries.


Friedlin

Lists of names (including Regenstrief patients), locations.

Regular expressions and dictionaries; identifiers in HL7 messages.


Gupta (De-ID system)

UMLS Metathesaurus, institution-specific identifiers

Regular expressions and dictionaries; identifiers in report headers.


Morrison (MedLEE)

MedLEE lexicon and UMLS Metathesaurus.

Rules/grammar-based, with dictionaries.


Neamatullah

Lists of common English words (non-PHI), names, locations, UMLS Metathesaurus and other medical terms, known patients and healthcare providers in the institution.

Regular expressions and dictionaries.


Ruch

MEDTAG lexicon (enriched with healthcare institution names, drug names, procedures, and devices)

Rule-based, with dictionaries.


Sweeney

Lists of names, U.S. states, countries, medical terms.

Rule-based, with dictionaries.


Thomas

List of names, UMLS Metathesaurus, Ispell terms.

Regular expressions and dictionaries.


Meystre et al. BMC Medical Research Methodology 2010 10:70   doi:10.1186/1471-2288-10-70

Open Data