Abstract:The increased application of machine learning (ML) in sensitive domains requires protecting the training data through privacy frameworks, such as differential privacy (DP). DP requires to specify a uniform privacy level $\varepsilon$ that expresses the maximum privacy loss that each data point in the entire dataset is willing to tolerate. Yet, in practice, different data points often have different privacy requirements. Having to set one uniform privacy level is usually too restrictive, often forcing a learner to guarantee the stringent privacy requirement, at a large cost to accuracy. To overcome this limitation, we introduce our novel Personalized-DP Output Perturbation method (PDP-OP) that enables to train Ridge regression models with individual per data point privacy levels. We provide rigorous privacy proofs for our PDP-OP as well as accuracy guarantees for the resulting model. This work is the first to provide such theoretical accuracy guarantees when it comes to personalized DP in machine learning, whereas previous work only provided empirical evaluations. We empirically evaluate PDP-OP on synthetic and real datasets and with diverse privacy distributions. We show that by enabling each data point to specify their own privacy requirement, we can significantly improve the privacy-accuracy trade-offs in DP. We also show that PDP-OP outperforms the personalized privacy techniques of Jorgensen et al. (2015).

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper attempts to solve the problem of how to maintain model accuracy while protecting individual privacy when applying machine learning on sensitive datasets. Specifically, the existing Differential Privacy (DP) framework requires setting a unified privacy level $\epsilon$ for the entire dataset, which is often too strict and leads to a significant decline in model accuracy. This is because different data points may have different privacy requirements, and the unified privacy setting cannot meet this diversity. To overcome this limitation, the paper introduces a new Personalized - DP Output Perturbation method (PDP - OP), allowing each data point to specify its own privacy level. Through this method, the paper aims to significantly improve the privacy - accuracy trade - off in differential privacy and provide theoretical accuracy guarantees. ### Main contributions 1. **Proposing a personalized differential privacy algorithm**: - The paper proposes the first personalized differential privacy algorithm (PDP - OP) specifically for Ridge Regression. 2. **Strict privacy proof and accuracy guarantee**: - It provides strict privacy proof and accuracy guarantee, which were lacking in previous research. 3. **Extensive experimental evaluation**: - Conducted extensive experimental evaluations on synthetic and real datasets, showing that PDP - OP significantly outperforms the standard output perturbation method on multiple datasets and under different privacy distributions. - Compared with Jorgensen et al.'s personalized privacy technology, showing that PDP - OP performs better in the privacy - accuracy trade - off. ### Technical details #### Personalized differential privacy for ridge regression - **Dataset assumptions**: - Consider the dataset $D=\{(x_i,y_i)\in X\times Y:i = 1,2,\cdots,n\}$, where $x_i\in[0,1]^d$ and $y_i\in[-1,1]$. - Each data point $i$ has a privacy requirement $\epsilon_i>0$, and the smaller $\epsilon_i$ is, the more stringent the privacy requirement. - **Objectives**: - Seek the parameter $\bar{\theta}$ that minimizes the ridge loss $L(\theta,\lambda)=\frac{1}{n}\sum_{i = 1}^n(y_i-\theta^{\top}x_i)^2+\lambda\|\theta\|_2^2$. - Instead of directly releasing $\bar{\theta}$, provide a private estimate $\hat{\theta}$ while ensuring that the privacy requirement $\epsilon_i$ of each data point $i$ is met. #### Main algorithms - **PDP - OP algorithm**: - First, calculate the non - private estimate $\bar{\theta}$: \[ \bar{\theta}=\arg\min_{\theta}\sum_{i = 1}^n w_i(y_i-\theta^{\top}x_i)^2+\lambda\|\theta\|_2^2 \] - Then add noise $Z$ whose density function is $\nu(b)\propto\exp(-\eta\|b\|^2)$. - Return the private estimate $\hat{\theta}=\bar{\theta}+Z$. - **Weight selection**: - The weight $w_i$ is selected as $w_i=\frac{\epsilon_i}{\sum_{j = 1}^n\epsilon_j}$.

Personalized Differential Privacy for Ridge Regression

Privacy Profiles for Private Selection

DPMLBench: Holistic Evaluation of Differentially Private Machine Learning

Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent

Not one but many Tradeoffs: Privacy Vs. Utility in Differentially Private Machine Learning

Improving the Privacy and Practicality of Objective Perturbation for Differentially Private Linear Learners

Noise-Aware Differentially Private Regression via Meta-Learning

Scalable Differential Privacy Mechanisms for Real-Time Machine Learning Applications

Differentially Private Model Personalization

Differential Privacy for Class-based Data: A Practical Gaussian Mechanism

AdaPDP: Adaptive Personalized Differential Privacy.

Efficient Sparse Least Absolute Deviation Regression with Differential Privacy

DPDR: Gradient Decomposition and Reconstruction for Differentially Private Deep Learning

Wasserstein Differential Privacy

Too Good to be True? Turn Any Model Differentially Private With DP-Weights

Partial sensitivity analysis in differential privacy

Differentially Private Simple Linear Regression

Augment then Smooth: Reconciling Differential Privacy with Certified Robustness

Differential Privacy Made Easy

Evaluating Differentially Private Machine Learning in Practice