A proposal to increase data utility on Global Differential Privacy data based on data use predictions

Henry C. Nunes,Marlon P. da Silva,Charles V. Neu,Avelino F. Zorzo
2024-01-12
Abstract:This paper presents ongoing research focused on improving the utility of data protected by Global Differential Privacy(DP) in the scenario of summary statistics. Our approach is based on predictions on how an analyst will use statistics released under DP protection, so that a developer can optimise data utility on further usage of the data in the privacy budget allocation. This novel approach can potentially improve the utility of data without compromising privacy constraints. We also propose a metric that can be used by the developer to optimise the budget allocation process.
Cryptography and Security,Databases
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to improve the utility of data without compromising privacy when applying global Differential Privacy (DP) to protect data. Specifically, the author proposes a method based on predicting how statisticians will use statistical data in the future to optimize the allocation of privacy budgets, thereby improving the utility of data. ### Specific description of the problem 1. **Background and challenges**: - Differential Privacy (DP) is a powerful privacy - protection method and has been applied in government systems and some large companies. - However, the application of DP may reduce the utility of data because the anonymization process introduces noise, resulting in a decline in data quality. - Therefore, how to improve the utility of data while maintaining privacy has become an important research topic. 2. **Limitations of existing methods**: - Currently, there are multiple methods for privacy budget allocation, but these methods are usually based on algorithms or resource allocation and do not consider the specific statistical methods that data analysts may use in the future. - These methods cannot directly optimize the privacy budget allocation according to the needs of data analysts. 3. **Solution proposed in the paper**: - The author proposes a new method to optimize the allocation of privacy budgets by predicting how data analysts will use the published statistical data in the future. - Specifically, developers can, according to the predicted usage, give priority to allocating more privacy budgets to those queries that are more important for subsequent analysis, thereby reducing the impact of noise and improving the utility of data. - To support this method, the author also proposes a metric to help developers compare different privacy budget allocation schemes and find the optimal solution. ### Mathematical model - **Privacy budget allocation**: Let the privacy parameter be \(\epsilon\), and it needs to be allocated to \(n_{sta}\) statistical data. Each statistical data \(sta_i\) has a corresponding sensitivity \(sen_i\) and an allocated privacy budget \(bud_i\). \[ Sta = [sta_1, sta_2,..., sta_{n_{sta}}] \] \[ Sen = [sen_1, sen_2,..., sen_{n_{sta}}] \] \[ Bud = [bud_1, bud_2,..., bud_{n_{sta}}] \] - **Constraints**: \[ \sum_{i = 1}^{n_{sta}} Bud[i]=\epsilon \] \[ \forall i(Bud[i]>0) \] - **Metric**: Define a metric function \(Metric(Tup, Eqs)\) to evaluate the utility of different privacy budget allocation schemes. Here, \(Tup\) is an array of tuples containing statistical data, sensitivities, and privacy budgets, and \(Eqs\) is an array of tuples containing equations and their sensitivities. \[ Metric(Tup, Eqs)=\sum_{i = 1}^{n_{sta}} us(Tup[i])+\sum_{i = 1}^{n_{eq}} ue(Eqs[i]) \] where \(us(t)\) represents the utility loss of a single statistical data, and \(ue(te)\) represents the utility loss of a single equation. ### Conclusions and future work - The author proposes a method based on predicting how data analysts will use statistical data in the future to improve the utility of data by optimizing the privacy budget allocation. - To support this method, the author also proposes a metric to help developers find the optimal privacy budget allocation scheme. - Future work includes further theoretical analysis and experimental evaluation, especially developing specific utility measurement methods and exploring methods for automatically finding the optimal privacy budget allocation. Through this method, the utility of data can be significantly improved without compromising privacy.