Log on / register
Feedback | Support | My details
Open AccessResearch article

Threshold protocol for the exchange of confidential medical data

Jules J Berman email

Cancer Diagnosis Program, National Cancer Institute, Bethesda, Maryland, USA

author email corresponding author email

BMC Medical Research Methodology 2002, 2:12doi:10.1186/1471-2288-2-12

Published: 11 November 2002

Abstract

Background

Medical researchers often need to share clinical data without violating patient confidentiality. Threshold cryptographic protocols divide messages into multiple pieces, no single piece containing information that can reconstruct the original message. The author describes and implements a novel threshold protocol that can be used to search, annotate or transform confidential data without breaching patient confidentiality.

Methods

The basic threshold protocol is: 1) Text is divided into short phrases; 2) Each phrase is converted by a one-way hash algorithm into a seemingly-random set of characters; 3) Threshold Piece 1 is composed of the list of all phrases, with each phrase followed by its one-way hash; 4) Threshold Piece 2 is composed of the text with all phrases replaced by their one-way hash values, and with high-frequency words preserved. Neither Piece 1 nor Piece 2 contains information linking patients to their records. The original text can be re-constructed from Piece 1 and Piece 2.

Results

The threshold algorithm produces two files (threshold pieces). In typical usage, Piece 2 is held by the data owner, and Piece 1 is freely distributed. Piece 1 can be annotated and returned to the owner of the original data to enhance the complete data set. Collections of Piece 1 files can be merged and distributed without identifying patient records. Variations of the threshold protocol are described. The author's Perl implementation is freely available.

Conclusions

Threshold files are safe in the sense that they are de-identified and can be used for research purposes. The threshold protocol is particularly useful when the receiver of the threshold file needs to obtain certain concepts or data-types found in the original data, but does not need to fully understand the original data set.


© 1999-2008 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.