Personalized Differential Privacy for Ridge Regression

Krishna Acharya,Franziska Boenisch,Rakshit Naidu,Juba Ziani
2024-01-31
Abstract:The increased application of machine learning (ML) in sensitive domains requires protecting the training data through privacy frameworks, such as differential privacy (DP). DP requires to specify a uniform privacy level $\varepsilon$ that expresses the maximum privacy loss that each data point in the entire dataset is willing to tolerate. Yet, in practice, different data points often have different privacy requirements. Having to set one uniform privacy level is usually too restrictive, often forcing a learner to guarantee the stringent privacy requirement, at a large cost to accuracy. To overcome this limitation, we introduce our novel Personalized-DP Output Perturbation method (PDP-OP) that enables to train Ridge regression models with individual per data point privacy levels. We provide rigorous privacy proofs for our PDP-OP as well as accuracy guarantees for the resulting model. This work is the first to provide such theoretical accuracy guarantees when it comes to personalized DP in machine learning, whereas previous work only provided empirical evaluations. We empirically evaluate PDP-OP on synthetic and real datasets and with diverse privacy distributions. We show that by enabling each data point to specify their own privacy requirement, we can significantly improve the privacy-accuracy trade-offs in DP. We also show that PDP-OP outperforms the personalized privacy techniques of Jorgensen et al. (2015).
Machine Learning,Cryptography and Security,Computers and Society
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper attempts to solve the problem of how to maintain model accuracy while protecting individual privacy when applying machine learning on sensitive datasets. Specifically, the existing Differential Privacy (DP) framework requires setting a unified privacy level \(\epsilon\) for the entire dataset, which is often too strict and leads to a significant decline in model accuracy. This is because different data points may have different privacy requirements, and the unified privacy setting cannot meet this diversity. To overcome this limitation, the paper introduces a new Personalized - DP Output Perturbation method (PDP - OP), allowing each data point to specify its own privacy level. Through this method, the paper aims to significantly improve the privacy - accuracy trade - off in differential privacy and provide theoretical accuracy guarantees. ### Main contributions 1. **Proposing a personalized differential privacy algorithm**: - The paper proposes the first personalized differential privacy algorithm (PDP - OP) specifically for Ridge Regression. 2. **Strict privacy proof and accuracy guarantee**: - It provides strict privacy proof and accuracy guarantee, which were lacking in previous research. 3. **Extensive experimental evaluation**: - Conducted extensive experimental evaluations on synthetic and real datasets, showing that PDP - OP significantly outperforms the standard output perturbation method on multiple datasets and under different privacy distributions. - Compared with Jorgensen et al.'s personalized privacy technology, showing that PDP - OP performs better in the privacy - accuracy trade - off. ### Technical details #### Personalized differential privacy for ridge regression - **Dataset assumptions**: - Consider the dataset \(D=\{(x_i,y_i)\in X\times Y:i = 1,2,\cdots,n\}\), where \(x_i\in[0,1]^d\) and \(y_i\in[-1,1]\). - Each data point \(i\) has a privacy requirement \(\epsilon_i>0\), and the smaller \(\epsilon_i\) is, the more stringent the privacy requirement. - **Objectives**: - Seek the parameter \(\bar{\theta}\) that minimizes the ridge loss \(L(\theta,\lambda)=\frac{1}{n}\sum_{i = 1}^n(y_i-\theta^{\top}x_i)^2+\lambda\|\theta\|_2^2\). - Instead of directly releasing \(\bar{\theta}\), provide a private estimate \(\hat{\theta}\) while ensuring that the privacy requirement \(\epsilon_i\) of each data point \(i\) is met. #### Main algorithms - **PDP - OP algorithm**: - First, calculate the non - private estimate \(\bar{\theta}\): \[ \bar{\theta}=\arg\min_{\theta}\sum_{i = 1}^n w_i(y_i-\theta^{\top}x_i)^2+\lambda\|\theta\|_2^2 \] - Then add noise \(Z\) whose density function is \(\nu(b)\propto\exp(-\eta\|b\|^2)\). - Return the private estimate \(\hat{\theta}=\bar{\theta}+Z\). - **Weight selection**: - The weight \(w_i\) is selected as \(w_i=\frac{\epsilon_i}{\sum_{j = 1}^n\epsilon_j}\).