Table 4

Rules to use for constraining the search for the de-identification solution

Relationship between maximum acceptable generalization (M) and adversary's background knowledge (Q)

Generalization level to use for suppression (S)

Generalization level to release data at


M Q(analysts could accept a generalization equal to or less detailed than the adversary's knowledge)

S = M (suppress at level M)

Only include data generalized to level S = M


M <Q (analysts need a version that is more detailed than the adversary's knowledge)

S = Q

The PUMF can have data at level M in the generalization hierarchy, except when it generalizes to a suppressed value at level S = Q.


The generalizations on the right-hand column can be applied to each quasi-identifier separately. The symbol M denotes the generalization hierarchy level representing the highest generalization level acceptable for analysis. The symbol Q denotes the generalization hierarchy level representing the adversary's background knowledge, and S denotes the level at which the suppression should be performed. (In our example, M = 1, and Q = 2. Therefore, using this example, the M <Q condition is met and we should: suppress at level Q; and include partially suppressed data at level M).

Emam et al. BMC Medical Informatics and Decision Making 2011 11:53   doi:10.1186/1472-6947-11-53

Open Data