Open Access Open Badges Research article

Managing protected health information in distributed research network environments: automated review to facilitate collaboration

Christine E Bredfeldt1*, Amy Butani2, Sandhyasree Padmanabhan3, Paul Hitz4 and Roy Pardee5

Author Affiliations

1 Mid-Atlantic Permanente Research Institute, Kaiser Permanente in the Mid-Atlantic States, Rockville, MD, USA

2 HealthPartners Institute for Education and Research, Bloomington, MN, USA

3 C-V Sight, Shrewsbury, MA, USA

4 Essentia Institute of Rural Health, Duluth, MN, USA

5 Group Health Research Institute, Seattle, WA, USA

For all author emails, please log on.

BMC Medical Informatics and Decision Making 2013, 13:39  doi:10.1186/1472-6947-13-39

Published: 22 March 2013



Multi-site health sciences research is becoming more common, as it enables investigation of rare outcomes and diseases and new healthcare innovations. Multi-site research usually involves the transfer of large amounts of research data between collaborators, which increases the potential for accidental disclosures of protected health information (PHI). Standard protocols for preventing release of PHI are extremely vulnerable to human error, particularly when the shared data sets are large.


To address this problem, we developed an automated program (SAS macro) to identify possible PHI in research data before it is transferred between research sites. The macro reviews all data in a designated directory to identify suspicious variable names and data patterns. The macro looks for variables that may contain personal identifiers such as medical record numbers and social security numbers. In addition, the macro identifies dates and numbers that may identify people who belong to small groups, who may be identifiable even in the absences of traditional identifiers.


Evaluation of the macro on 100 sample research data sets indicated a recall of 0.98 and precision of 0.81.


When implemented consistently, the macro has the potential to streamline the PHI review process and significantly reduce accidental PHI disclosures.

HIPAA; Protected health information; Distributed research; De-identification