Abstract:<p>Privacy preserving is a paramount concern in publishing datasets that contain sensitive information. Preventing privacy disclosure and providing useful information to legitimate users for data analyzing/mining are conflicting goals. <em>Randomized response</em> is a class of techniques that perturbs each sensitive value in a certain way, so that personal privacy is protected while the large-trend of the entire dataset is still recoverable. However, existing randomized response techniques do not allow to flexibly configure the level of privacy protection, support only a few types of aggregate queries, and can not achieve the best answer accuracy from perturbed data. These drawbacks impair the effectiveness of those techniques. This paper proposes a general framework based on randomized response techniques, which has good flexibility and extensibility, and can improve the effectiveness of randomized response methods. Our approach is validated by extensive experiments and comparison with existing randomized response and generalization methods.</p>

What problem does this paper attempt to address?

The paper attempts to address the issue of protecting individual privacy when releasing datasets containing sensitive information. Specifically, the authors focus on how to provide useful information to legitimate users for data analysis or mining while preventing privacy breaches. Existing randomized response techniques suffer from a lack of flexibility, support for only a few types of aggregate queries, and the inability to obtain optimal answer accuracy from perturbed data, which affects the effectiveness of these techniques. Therefore, this paper proposes a general framework based on randomized response techniques aimed at improving the flexibility, scalability, and effectiveness of randomized response methods. The main contributions of the paper include: 1. Proposing a general framework for data release based on randomized response techniques, which reduces the computational complexity of reconstructing unbiased estimated answers from exponential correlation to linear correlation by utilizing matrix decomposition methods and the properties of the Kronecker product. 2. Proposing a general method for constructing recovery matrices from arbitrary perturbation matrices, which can minimize the variance of unbiased estimated answers. 3. Developing perturbation and reconstruction algorithms for Boolean attributes and categorical attributes, and providing theoretical analysis. These algorithms can be extended to numerical attributes. 4. Validating the effectiveness of the proposed framework through extensive experiments and comparisons with existing randomized response and generalization methods.

A general framework for privacy-preserving of data publication based on randomized response techniques

A Privacy Framework

A Privacy Framework: Indistinguishable Privacy

A Novel Privacy Preserving Method for Data Publication

Anonymity-preserving data collection.

A Privacy Protection Model of Data Publication Based on Game Theory

When Differential Privacy Meets Randomized Perturbation: A Hybrid Approach For Privacy-Preserving Recommender System

A Summary of Privacy-Preserving Data Publishing in the Local Setting

Multi-level Privacy Preserving Data Publishing

A combined random noise perturbation approach for multi level privacy preservation in data mining

A General Framework for Privacy-Preserving Distributed Greedy Algorithm.

Randomized Response Mechanisms for Differential Privacy Data Analysis: Bounds and Applications

An Effective Method for Privacy Preserving Association Rule Mining

Data Level Privacy Preserving: A Stochastic Perturbation Approach based on Differential Privacy

Privacy-aware Data Publishing Against Sparse Estimation Attack

DP-CDA: An Algorithm for Enhanced Privacy Preservation in Dataset Synthesis Through Randomized Mixing

A Data Publishing System Based on Privacy Preservation

A Collaborative Mechanism for Private Data Publication in Smart Cities

Inference Analysis in Privacy-Preserving Data Re-publishing

Privacy-preserving data publishing: an information-driven distributed genetic algorithm