Open Access Research article

Validating self-report of diabetes use by participants in the 45 and up study: a record linkage study

Elizabeth Jean Comino1*, Duong Thuy Tran2, Marion Haas3, Jeff Flack4, Bin Jalaludin56, Louisa Jorm2 and Mark Fort Harris1

Author Affiliations

1 Centre for Primary Health Care and Equity, University of New South Wales, Sydney, NSW 2052, Australia

2 Centre for Health Research, School of Medicine, University of Western Sydney, Locked Bag 1797, Penrith, NSW 2751, Australia

3 Centre for Health Economics Research and Evaluation, Faculty of Business, University of Technology, Sydney, PO Box 123, Broadway, NSW 2007, Level 4, 645 Harris Street, Ultimo, NSW 2007, Australia

4 Diabetes Centre, Bankstown-Lidcombe Hospital, Eldridge Road, Bankstown, NSW 2200, Australia

5 Centre for Research, Evidence Management and Surveillance, Sydney and South Western Sydney Local Health Districts, Locked Bag 7017, Liverpool, NSW 1871, Australia

6 School of Public Health and Community Medicine, University of New South Wales, Sydney 2052, Australia

For all author emails, please log on.

BMC Health Services Research 2013, 13:481  doi:10.1186/1472-6963-13-481

Published: 19 November 2013



Prevalence studies usually depend on self-report of disease status in survey data or administrative data collections and may over- or under-estimate disease prevalence. The establishment of a linked data collection provided an opportunity to explore the accuracy and completeness of capture of information about diabetes in survey and administrative data collections.


Baseline questionnaire data at recruitment to the 45 and Up Study was obtained for 266,848 adults aged 45 years and over sampled from New South Wales, Australia in 2006–2009, and linked to administrative data about hospitalisation from the Admitted Patient Data Collection (APDC) for 2000–2009, claims for medical services (MBS) and pharmaceuticals (PBS) from Medicare Australia data for 2004–2009. Diabetes status was determined from response to a question ‘Has a doctor EVER told you that you have diabetes’ (n = 23,981) and augmented by examination of free text fields about diagnosis (n = 119) or use of insulin (n = 58). These data were used to identify the sub-group with type 1 diabetes. We explored the agreement between self-report of diabetes, identification of diabetes diagnostic codes in APDC data, claims for glycosylated haemoglobin (HbA1c) in MBS data, and claims for dispensed medication (oral hyperglycaemic agents and insulin) in PBS data.


Most participants with diabetes were identified in APDC data if admitted to hospital (79.3%), in MBS data with at least one claim for HbA1c testing (84.7%; 73.4% if 2 tests claimed) or in PBS data through claim for diabetes medication (71.4%). Using these alternate data collections as an imperfect ‘gold standard’ we calculated sensitivities of 83.7% for APDC, 63.9% (80.5% for two tests) for MBS, and 96.6% for PBS data and specificities of 97.7%, 98.4% and 97.1% respectively. The lower sensitivity for HbA1c may reflect the use of this test to screen for diabetes suggesting that it is less useful in identifying people with diabetes without additional information. Kappa values were 0.80, 0.70 and 0.80 for APDC, MBS and PBS respectively reflecting the large population sample under consideration. Compared to APDC, there was poor agreement about identifying type 1 diabetes status.


Self-report of diagnosis augmented with free text data indicating diabetes as a chronic condition and/or use of insulin among medications used was able to identify participants with diabetes with high sensitivity and specificity compared to available administrative data collections.

Primary health care; Cohort studies; Diabetes mellitus; Record linkage; Health service data; Quality of health care; Validation study; Sensitivity and specificity; Older age; English language