Email updates

Keep up to date with the latest news and content from BMC Health Services Research and BioMed Central.

Open Access Highly Accessed Research article

Data Linkage: A powerful research tool with potential problems

Megan A Bohensky1*, Damien Jolley1, Vijaya Sundararajan2, Sue Evans1, David V Pilcher3, Ian Scott4 and Caroline A Brand1

Author Affiliations

1 Centre of Research Excellence in Patient Safety, Dept of Epidemiology & Preventive Medicine, School Public Health & Preventive Medicine, Monash University, Melbourne, Victoria, Australia, 3181

2 Department of Health Victoria, 50 Lonsdale Street, Melbourne Victoria, Australia 3000

3 Australian & New Zealand Intensive Care Society, Centre for Outcomes and Resource Evaluation, 10 Ievers Terrace, Carlton Victoria, Australia 3053

4 Department of Internal Medicine, Princess Alexandra Hospital, Brisbane, Queensland, Australia 4102

For all author emails, please log on.

BMC Health Services Research 2010, 10:346  doi:10.1186/1472-6963-10-346

Published: 22 December 2010



Policy makers, clinicians and researchers are demonstrating increasing interest in using data linked from multiple sources to support measurement of clinical performance and patient health outcomes. However, the utility of data linkage may be compromised by sub-optimal or incomplete linkage, leading to systematic bias. In this study, we synthesize the evidence identifying participant or population characteristics that can influence the validity and completeness of data linkage and may be associated with systematic bias in reported outcomes.


A narrative review, using structured search methods was undertaken. Key words "data linkage" and Mesh term "medical record linkage" were applied to Medline, EMBASE and CINAHL databases between 1991 and 2007. Abstract inclusion criteria were; the article attempted an empirical evaluation of methodological issues relating to data linkage and reported on patient characteristics, the study design included analysis of matched versus unmatched records, and the report was in English. Included articles were grouped thematically according to patient characteristics that were compared between matched and unmatched records.


The search identified 1810 articles of which 33 (1.8%) met inclusion criteria. There was marked heterogeneity in study methods and factors investigated. Characteristics that were unevenly distributed among matched and unmatched records were; age (72% of studies), sex (50% of studies), race (64% of studies), geographical/hospital site (93% of studies), socio-economic status (82% of studies) and health status (72% of studies).


A number of relevant patient or population factors may be associated with incomplete data linkage resulting in systematic bias in reported clinical outcomes. Readers should consider these factors in interpreting the reported results of data linkage studies.