Disseminating massive frequency tables by masking aggregated cell frequencies

Kwon, Sunghoon
DOI: https://doi.org/10.1007/s42952-023-00248-x
2024-01-30
Journal of the Korean Statistical Society
Abstract:We propose a confidential approach for disseminating frequency tables constructed for any combination of key variables in the given microdata, including those of hierarchical key variables. The system generates all possible frequency tables by either marginalizing or aggregating fully joint frequency tables of key variables while protecting the original cells with low frequencies through two masking steps: the small cell adjustments for joint tables followed by the proposed algorithm called information loss bounded aggregation for aggregated cells. The two-step approach is designed to control both disclosure risk and information loss by ensuring the k -anonymity of original cells with small frequencies while keeping the loss within a bounded limit.
statistics & probability
What problem does this paper attempt to address?