Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Selected articles from the 9th International Workshop on Data Mining in Bioinformatics (BIOKDD)

Open Access Open Badges Proceedings

New threats to health data privacy

Fengjun Li1, Xukai Zou2, Peng Liu3 and Jake Y Chen2

Author Affiliations

1 Department of EECS, The University of Kansas, Lawrence, Kansas, USA

2 Department of Computer and Information Science, IUPUI, Indianapolis, Indiana, USA

3 College of IST, The Pennsylvania State University, University Park, Pennsylvania, USA

BMC Bioinformatics 2011, 12(Suppl 12):S7  doi:10.1186/1471-2105-12-S12-S7

Published: 24 November 2011



Along with the rapid digitalization of health data (e.g. Electronic Health Records), there is an increasing concern on maintaining data privacy while garnering the benefits, especially when the data are required to be published for secondary use. Most of the current research on protecting health data privacy is centered around data de-identification and data anonymization, which removes the identifiable information from the published health data to prevent an adversary from reasoning about the privacy of the patients. However, published health data is not the only source that the adversaries can count on: with a large amount of information that people voluntarily share on the Web, sophisticated attacks that join disparate information pieces from multiple sources against health data privacy become practical. Limited efforts have been devoted to studying these attacks yet.


We study how patient privacy could be compromised with the help of today’s information technologies. In particular, we show that private healthcare information could be collected by aggregating and associating disparate pieces of information from multiple online data sources including online social networks, public records and search engine results. We demonstrate a real-world case study to show user identity and privacy are highly vulnerable to the attribution, inference and aggregation attacks. We also show that people are highly identifiable to adversaries even with inaccurate information pieces about the target, with real data analysis.


We claim that too much information has been made available electronic and available online that people are very vulnerable without effective privacy protection.