Email updates

Keep up to date with the latest news and content from BMC Medical Informatics and Decision Making and BioMed Central.

Open Access Open Badges Research article

De-identification of primary care electronic medical records free-text data in Ontario, Canada

Karen Tu123*, Julie Klein-Geltink1, Tezeta F Mitiku1, Chiriac Mihai1 and Joel Martin4

Author affiliations

1 Institute for Clinical Evaluative Sciences (ICES) G106, 2075 Bayview Avenue, Toronto, Ontario, M4N 3M5, Canada

2 Department of Family and Community Medicine-University of Toronto, 263 McCaul Street, 5th Floor Toronto, Ontario, M5T 1W7, Canada

3 Toronto Western Hospital Family Health Team-University Health Network, 399 Bathurst Street, Toronto, Ontario, M5T 2S8, Canada

4 Institute for Information Technology, National Research Council, 1200 Montreal Road, Ottawa, Ontario, K1A 0R6, Canada

For all author emails, please log on.

Citation and License

BMC Medical Informatics and Decision Making 2010, 10:35  doi:10.1186/1472-6947-10-35

Published: 18 June 2010



Electronic medical records (EMRs) represent a potentially rich source of health information for research but the free-text in EMRs often contains identifying information. While de-identification tools have been developed for free-text, none have been developed or tested for the full range of primary care EMR data


We used deid open source de-identification software and modified it for an Ontario context for use on primary care EMR data. We developed the modified program on a training set of 1000 free-text records from one group practice and then tested it on two validation sets from a random sample of 700 free-text EMR records from 17 different physicians from 7 different practices in 5 different cities and 500 free-text records from a group practice that was in a different city than the group practice that was used for the training set. We measured the sensitivity/recall, precision, specificity, accuracy and F-measure of the modified tool against manually tagged free-text records to remove patient and physician names, locations, addresses, medical record, health card and telephone numbers.


We found that the modified training program performed with a sensitivity of 88.3%, specificity of 91.4%, precision of 91.3%, accuracy of 89.9% and F-measure of 0.90. The validations sets had sensitivities of 86.7% and 80.2%, specificities of 91.4% and 87.7%, precisions of 91.1% and 87.4%, accuracies of 89.0% and 83.8% and F-measures of 0.89 and 0.84 for the first and second validation sets respectively.


The deid program can be modified to reasonably accurately de-identify free-text primary care EMR records while preserving clinical content.