Table 1

Automatic de-identification systems and their principal characteristics

1st author

System Name

Availability/License

Programming language/Resources (when known)

Knowledge resources

Document Types


Aramaki [23]

System for the i2b2 de-identification challenge

Not publicly available

CRF++1

Lists of names, locations, dates

Discharge summaries


Beckwith [14]

HMS Scrubber

Open source (GNU LGPL v2)

Java, JDOM, MySQL

Lists of names, locations

Surgical pathology reports


Berman [5]

Concept-Match

System freely available

Perl

UMLS Metathesaurus

Surgical pathology reports


Fielstein [7]

(VA system)

Not publicly available

Perl

Lists of names, locations, email addresses

VA compensation and pension examinations


Friedlin [8]

MeDS

Not publicly available

Java

Lists of names, locations, medical terms

HL7 messages


Gardner [24]

HIDE

Open source (Common Public License v1)

Perl, Java, Mallet 2

None

Surgical pathology reports


Guo [25]

System for the i2b2 de-identification challenge

Not publicly available

GATE 3

(ANNIE, JAPE), Java, SVMlight 4

Lists of locations, hospitals.

Discharge summaries


Gupta [15]

DE-ID (DE-ID Data Corp., Richboro, PA)

Commercial system, not freely available.

Unknown

List of U.S. census names, user defined dictionaries

Surgical pathology reports


Hara [27]

System for the i2b2 de-identification challenge

Not publicly available

C++, BACT and YamCha 5

None

Discharge summaries


Morrison [18]

MedLEE

Not publicly available

Prolog

MedLEE lexicon, UMLS Metathesaurus

Outpatient follow-up notes


Neamatullah [9]

(MIT system)

Open source (GNU GPL v2)

Perl

Lists of common English words (non-PHI), terms indicating PHI, names and locations, known PHI (patients and staff list!)

Nursing progress notes, discharge summaries


Ruch [19]

MEDTAG framework-based

Not publicly available

Unknown

MEDTAG lexicon (based on UMLS Metathesaurus; only in French)

Various clinical documents (multilingual)


Sweeney [20]

Scrub

Not publicly available

Unknown

Lists of area codes, names

Various clinical documents


Szarvas [28]

System for the i2b2 de-identification challenge

Not publicly available

Weka 6

Lists of first names, locations, diseases, non-PHI (general English)

Discharge summaries


Taira [30]

(UCLA system)

Not publicly available

Unknown

List of names, and drugs

Various clinical documents


Thomas [33]

(Regenstrief Institute system)

Not publicly available

Java, XSL

List of names, UMLS Metathesaurus terms.

Surgical pathology reports


Uzuner [31]

Stat De-id

Not publicly available (open source release planned).

LIBSVM 7

MeSH terms, lists of names, locations, and hospitals.

Discharge summaries


Wellner [32]

System for the i2b2 de-identification challenge

Open source (BSD)

Ocaml 8,

Carafe 9

Lists of US states, months, common English words.

Discharge summaries


1 http://crfpp.sourceforge.net/ webcite

2 http://mallet.cs.umass.edu/ webcite

3 http://gate.ac.uk/ webcite

4 http://svmlight.joachims.org/ webcite

5 http://www.chasen.org/~taku/software/ webcite

6 http://www.cs.waikato.ac.nz/ml/weka/ webcite

7 http://www.csie.ntu.edu.tw/~cjlin/libsvm webcite

8 http://caml.inria.fr/ocaml/index.en.html webcite

9 http://sourceforge.net/projects/carafe/ webcite

Meystre et al. BMC Medical Research Methodology 2010 10:70   doi:10.1186/1471-2288-10-70

Open Data