Abstract:Differentially private stochastic gradient descent (DP-SGD) refers to a family of optimization algorithms that provide a guaranteed level of differential privacy (DP) through DP accounting techniques. However, current accounting techniques make assumptions that diverge significantly from practical DP-SGD implementations. For example, they may assume the loss function is Lipschitz continuous and convex, sample the batches randomly with replacement, or omit the gradient clipping step. In this work, we analyze the most commonly used variant of DP-SGD, in which we sample batches cyclically with replacement, perform gradient clipping, and only release the last DP-SGD iterate. More specifically - without assuming convexity, smoothness, or Lipschitz continuity of the loss function - we establish new Rényi differential privacy (RDP) bounds for the last DP-SGD iterate under the mild assumption that (i) the DP-SGD stepsize is small relative to the topological constants in the loss function, and (ii) the loss function is weakly-convex. Moreover, we show that our bounds converge to previously established convex bounds when the weak-convexity parameter of the objective function approaches zero. In the case of non-Lipschitz smooth loss functions, we provide a weaker bound that scales well in terms of the number of DP-SGD iterations.

What problem does this paper attempt to address?

This paper attempts to solve the privacy protection problem of the differentially private stochastic gradient descent (DP - SGD) algorithm in practical applications. Specifically, most of the existing privacy analyses on DP - SGD assume some ideal properties of the loss function, such as convexity, smoothness, and Lipschitz continuity, and these assumptions often do not hold in practical applications. In addition, existing methods usually assume that data batches are randomly sampled and ignore the gradient clipping step, which is inconsistent with the situation in actual implementations. To solve these problems, the main contributions of this paper are as follows: 1. **Privacy Boundary under Weak Assumptions**: This paper establishes new Rényi differential privacy (RDP) boundaries without assuming that the loss function has convexity, smoothness, or Lipschitz continuity. These boundaries only require that the step size of DP - SGD is small relative to the topological constant of the loss function and that the loss function is weakly convex. 2. **Boundary Convergence**: When the weakly convex parameter of the objective function is close to zero, the boundaries established in this paper converge smoothly to the existing convex boundaries. For non - Lipschitz smooth loss functions, this paper provides weaker boundaries, which perform well as the number of DP - SGD iterations increases. 3. **Parameter Influence**: This paper shows that the privacy boundary can be reduced by decreasing the SGD step size, increasing the standard deviation of the Gaussian noise in DP - SGD, or increasing the batch size. ### Problem Definition This paper focuses on the application of DP - SGD to composite optimization problems: \[ \min_{x \in \mathbb{R}^n} \left\{ \varphi(x) := \frac{1}{k} \sum_{i = 1}^k f_i(x) + h(x) \right\} \] where \(h\) is a convex and proper lower - semicontinuous function, and \(f_i\) is continuously differentiable on the domain of \(h\). In particular, \(h\) can be a common non - smooth regularization function, such as the \(\ell_1\) norm \(\|\cdot\|_1\), the nuclear matrix norm \(\|\cdot\|_*\), and the elastic - net regularizer, or it can be an indicator function on a closed convex set. ### Types of Boundaries This paper establishes RDP boundaries under three different conditions: 1. **Without Additional Assumptions**: In this case, the boundary has the form: \[ D_\alpha(X_T \| X'_T) \preceq \alpha \cdot \frac{T (\lambda C)^2}{\sigma^2} \] 2. **Assume that DP - SGD Iterations are within an \(\ell_2\) Ball**: In this case, the boundary has the form: \[ D_\alpha(X_T \| X'_T) \preceq \alpha \cdot \frac{(d_h + \lambda C / b)^2}{\sigma^2} \] where \(d_h\) is the diameter of the domain of \(h\). 3. **Assume that each \(\nabla f_i\) is Lipschitz Continuous**: In this case, the boundary has the form: \[ D_\alpha(X_T \| X'_T) \preceq \alpha \cdot \frac{T}{\ell} \left( \frac{L^2 \ell}{\lambda} \sum_{i = 1}^\ell \frac{L_i^2}{\lambda} \right) \left( \frac{\lambda C}{b} \right)^2 \] where \(L\) and \(L_i\) are certain finite values respectively. ### Mathematical Techniques 1. **Weak Convexity**: This paper generalizes the existing analysis in convex and twice - differentiable settings to weakly convex (possibly non - smooth) composite settings. By using the recently developed optimal transport techniques and the curvature characterization of non - convex functions, this paper shows how to make the boundaries converge by appropriately scaling the weakly convex parameter.

Privacy of the last iterate in cyclically-sampled DP-SGD on nonconvex composite losses

A(DP)$^2$SGD: Asynchronous Decentralized Parallel Stochastic Gradient Descent with Differential Privacy

A(DP)$^2$2SGD: Asynchronous Decentralized Parallel Stochastic Gradient Descent with Differential Privacy

Gradients Look Alike: Sensitivity is Often Overestimated in DP-SGD

Privacy Loss of Noisy Stochastic Gradient Descent Might Converge Even for Non-Convex Losses

Differentially Private Stochastic Gradient Descent with Fixed-Size Minibatches: Tighter RDP Guarantees with or without Replacement

Improving Differentially Private SGD via Randomly Sparsified Gradients

It's Our Loss: No Privacy Amplification for Hidden State DP-SGD With Non-Convex Loss

Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent

How Private are DP-SGD Implementations?

Dynamic Differential-Privacy Preserving SGD

Enhancing DP-SGD through Non-monotonous Adaptive Scaling Gradient Weight

DPDR: Gradient Decomposition and Reconstruction for Differentially Private Deep Learning

DP-SGD with weight clipping

Convergent Privacy Loss of Noisy-SGD without Convexity and Smoothness

Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach

Beyond Uniform Lipschitz Condition in Differentially Private Optimization

Differential Privacy in Distributed Optimization with Gradient Tracking

Normalized/Clipped SGD with Perturbation for Differentially Private Non-Convex Optimization

Private Stochastic Non-Convex Optimization: Adaptive Algorithms and Tighter Generalization Bounds