Abstract:Differentially private stochastic gradient descent (DP-SGD) is the canonical approach to private deep learning. While the current privacy analysis of DP-SGD is known to be tight in some settings, several empirical results suggest that models trained on common benchmark datasets leak significantly less privacy for many datapoints. Yet, despite past attempts, a rigorous explanation for why this is the case has not been reached. Is it because there exist tighter privacy upper bounds when restricted to these dataset settings, or are our attacks not strong enough for certain datapoints? In this paper, we provide the first per-instance (i.e., ``data-dependent") DP analysis of DP-SGD. Our analysis captures the intuition that points with similar neighbors in the dataset enjoy better data-dependent privacy than outliers. Formally, this is done by modifying the per-step privacy analysis of DP-SGD to introduce a dependence on the distribution of model updates computed from a training dataset. We further develop a new composition theorem to effectively use this new per-step analysis to reason about an entire training run. Put all together, our evaluation shows that this novel DP-SGD analysis allows us to now formally show that DP-SGD leaks significantly less privacy for many datapoints (when trained on common benchmarks) than the current data-independent guarantee. This implies privacy attacks will necessarily fail against many datapoints if the adversary does not have sufficient control over the possible training datasets.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in Differentially Private Stochastic Gradient Descent (DP - SGD), current privacy analysis methods may overestimate the actual data leakage risk. Specifically, although the existing DP - SGD privacy analysis is tight in some settings, many empirical results show that models trained on common benchmark datasets have far lower actual privacy leakage for many data points than the results of theoretical analysis. However, there is currently no strict explanation for the cause of this phenomenon. Are there tighter privacy upper bounds in these dataset settings, or are our attack methods not powerful enough for some data points? In this paper, the author provides the first DP - SGD privacy analysis for each data point (i.e., "data - dependent") to explain why on some datasets, the actual privacy leakage of DP - SGD is much lower than predicted by existing analysis. ### Main Contributions 1. **Introducing Sensitivity Distribution**: The author proposes the concept of sensitivity distribution, which is used to capture the changes in mini - batch updates caused by specific data points. Through this distribution, the privacy leakage of a single DP - SGD update can be evaluated more accurately. 2. **New Composition Theorem**: The author develops a new composition theorem that allows the use of the expected privacy leakage at each step to evaluate the overall privacy leakage of the entire training process. This is different from the traditional worst - case analysis and can better reflect the privacy characteristics in actual training. 3. **Experimental Verification**: Through experimental verification, the author shows that on common benchmark datasets, the individual privacy guarantees of many data points are far better than the existing data - independent guarantees. For example, the privacy guarantee ε value of some data points has increased by several orders of magnitude. ### Technical Details - **Sensitivity Distribution**: Defined as \(\Delta_{U, x^*}(X_B)=\|U(X_B)-U(X_B\cup\{x^*\})\|_2\), where \(X_B\) is a mini - batch and \(x^*\) is a data point. By calculating the \(L_p\) norm of this distribution, a tighter privacy guarantee can be obtained. - **New Composition Theorem**: The traditional Rényi differential privacy composition theorem implicitly uses the maximum value of the privacy guarantee per step (i.e., the \(L_\infty\) norm). The new composition theorem proposed by the author uses an arbitrary \(L_p\) norm and can better utilize the situation where many models have better privacy guarantees. ### Experimental Results - **Better Individual Privacy Guarantees**: On common benchmark datasets, the individual privacy guarantees of many data points are significantly better than the existing data - independent guarantees. - **Relationship between Classification Performance and Privacy**: Data points that are correctly classified usually have better privacy guarantees, indicating that high - performance models are also more private to a certain extent. - **Impact of Mini - batch Sampling Rate**: Under certain update rules, a higher mini - batch sampling rate can provide better individual privacy because mini - batch updates are concentrated near the mean of the dataset. ### Significance - **Unlearning and Generalization**: Stronger individual differential privacy guarantees mean that unlearning requests can be handled more flexibly, and the generalization performance and memory ability of the model are also better quantified. - **Privacy Auditing**: This work provides an empirical privacy upper bound, which can be combined with the lower bounds of previous strong privacy attacks to further test the effectiveness of these attacks. In conclusion, this paper provides a new framework for individual privacy analysis of DP - SGD by introducing sensitivity distribution and a new composition theorem, explaining why on some datasets, the actual privacy leakage of DP - SGD is far lower than the results of theoretical analysis. This not only helps to improve the understanding of privacy protection mechanisms but also provides important tools and methods for future privacy research and applications.

Gradients Look Alike: Sensitivity is Often Overestimated in DP-SGD

Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent

A(DP)$^2$SGD: Asynchronous Decentralized Parallel Stochastic Gradient Descent with Differential Privacy

DPDR: Gradient Decomposition and Reconstruction for Differentially Private Deep Learning

How Private are DP-SGD Implementations?

Enhancing DP-SGD through Non-monotonous Adaptive Scaling Gradient Weight

Dynamic Differential-Privacy Preserving SGD

A(DP)$^2$2SGD: Asynchronous Decentralized Parallel Stochastic Gradient Descent with Differential Privacy

It's Our Loss: No Privacy Amplification for Hidden State DP-SGD With Non-Convex Loss

Improving Differentially Private SGD via Randomly Sparsified Gradients

Bypassing the Ambient Dimension: Private SGD with Gradient Subspace Identification

Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model

Privacy of the last iterate in cyclically-sampled DP-SGD on nonconvex composite losses

DP-SGD with weight clipping

DPMLBench: Holistic Evaluation of Differentially Private Machine Learning

Privacy Loss of Noisy Stochastic Gradient Descent Might Converge Even for Non-Convex Losses

Towards Efficient and Scalable Training of Differentially Private Deep Learning

The Last Iterate Advantage: Empirical Auditing and Principled Heuristic Analysis of Differentially Private SGD

GReDP: A More Robust Approach for Differential Private Training with Gradient-Preserving Noise Reduction

Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach