The appeal of the gamma family distribution to protect the confidentiality of contingency tables

James Jackson,Robin Mitra,Brian Francis,Iain Dove
DOI: https://doi.org/10.48550/arxiv.2408.02513
2024-08-05
Methodology
Abstract:Administrative databases, such as the English School Census (ESC), are rich sources of information that are potentially useful for researchers. For such data sources to be made available, however, strict guarantees of privacy would be required. To achieve this, synthetic data methods can be used. Such methods, when protecting the confidentiality of tabular data (contingency tables), often utilise the Poisson or Poisson-mixture distributions, such as the negative binomial (NBI). These distributions, however, are either equidispersed (in the case of the Poisson) or overdispersed (e.g. in the case of the NBI), which results in excessive noise being applied to large low-risk counts. This paper proposes the use of the (discretized) gamma family (GAF) distribution, which allows noise to be applied in a more bespoke fashion. Specifically, it allows less noise to be applied as cell counts become larger, providing an optimal balance in relation to the risk-utility trade-off. We illustrate the suitability of the GAF distribution on an administrative-type data set that is reminiscent of the ESC.
What problem does this paper attempt to address?