Table 1

Linkage scenarios by identifiers and string comparison techniques applied to names
String comparison techniques applied to first and surnames
Exact JW ≥ 0.7 JW ≥ 0.9 DM Soundex JW ≥ 0.9 or DM or soundex
Identifiers used Routinely collected identifiers* S1 S2 S3 S4 S5 S6
Routinely collected identifiers + household member first name S7 S8 S9 S10 S11 S12
Routinely collected identifiers + household member first name and surname S13 S14 S15
Deterministic linkage on National ID Number or telephone number followed by best of S1-S15** S16
S16 + clerical review of 5%, 10%, 15%, and 20% of record pairs above and below the threshold value above which record pairs are automatically accepted as matches S17-S20

*Routinely collected identifiers = first name, last name, sex, day of birth, month of birth, year of birth and village; JW = Jaro-Winkler; DM = double metaphone code.

**The best of the 15 probabilistic linkage scenarios is the one that yields the maximum sensitivity and PPV.

Kabudula et al.

Kabudula et al. BMC Medical Research Methodology 2014 14:71   doi:10.1186/1471-2288-14-71

Open Data