Privacy of the last iterate in cyclically-sampled DP-SGD on nonconvex composite losses

Weiwei Kong,Mónica Ribero
2024-07-07
Abstract:Differentially private stochastic gradient descent (DP-SGD) refers to a family of optimization algorithms that provide a guaranteed level of differential privacy (DP) through DP accounting techniques. However, current accounting techniques make assumptions that diverge significantly from practical DP-SGD implementations. For example, they may assume the loss function is Lipschitz continuous and convex, sample the batches randomly with replacement, or omit the gradient clipping step. In this work, we analyze the most commonly used variant of DP-SGD, in which we sample batches cyclically with replacement, perform gradient clipping, and only release the last DP-SGD iterate. More specifically - without assuming convexity, smoothness, or Lipschitz continuity of the loss function - we establish new Rényi differential privacy (RDP) bounds for the last DP-SGD iterate under the mild assumption that (i) the DP-SGD stepsize is small relative to the topological constants in the loss function, and (ii) the loss function is weakly-convex. Moreover, we show that our bounds converge to previously established convex bounds when the weak-convexity parameter of the objective function approaches zero. In the case of non-Lipschitz smooth loss functions, we provide a weaker bound that scales well in terms of the number of DP-SGD iterations.
Machine Learning,Data Structures and Algorithms,Optimization and Control
What problem does this paper attempt to address?
This paper attempts to solve the privacy protection problem of the differentially private stochastic gradient descent (DP - SGD) algorithm in practical applications. Specifically, most of the existing privacy analyses on DP - SGD assume some ideal properties of the loss function, such as convexity, smoothness, and Lipschitz continuity, and these assumptions often do not hold in practical applications. In addition, existing methods usually assume that data batches are randomly sampled and ignore the gradient clipping step, which is inconsistent with the situation in actual implementations. To solve these problems, the main contributions of this paper are as follows: 1. **Privacy Boundary under Weak Assumptions**: This paper establishes new Rényi differential privacy (RDP) boundaries without assuming that the loss function has convexity, smoothness, or Lipschitz continuity. These boundaries only require that the step size of DP - SGD is small relative to the topological constant of the loss function and that the loss function is weakly convex. 2. **Boundary Convergence**: When the weakly convex parameter of the objective function is close to zero, the boundaries established in this paper converge smoothly to the existing convex boundaries. For non - Lipschitz smooth loss functions, this paper provides weaker boundaries, which perform well as the number of DP - SGD iterations increases. 3. **Parameter Influence**: This paper shows that the privacy boundary can be reduced by decreasing the SGD step size, increasing the standard deviation of the Gaussian noise in DP - SGD, or increasing the batch size. ### Problem Definition This paper focuses on the application of DP - SGD to composite optimization problems: \[ \min_{x \in \mathbb{R}^n} \left\{ \varphi(x) := \frac{1}{k} \sum_{i = 1}^k f_i(x) + h(x) \right\} \] where \(h\) is a convex and proper lower - semicontinuous function, and \(f_i\) is continuously differentiable on the domain of \(h\). In particular, \(h\) can be a common non - smooth regularization function, such as the \(\ell_1\) norm \(\|\cdot\|_1\), the nuclear matrix norm \(\|\cdot\|_*\), and the elastic - net regularizer, or it can be an indicator function on a closed convex set. ### Types of Boundaries This paper establishes RDP boundaries under three different conditions: 1. **Without Additional Assumptions**: In this case, the boundary has the form: \[ D_\alpha(X_T \| X'_T) \preceq \alpha \cdot \frac{T (\lambda C)^2}{\sigma^2} \] 2. **Assume that DP - SGD Iterations are within an \(\ell_2\) Ball**: In this case, the boundary has the form: \[ D_\alpha(X_T \| X'_T) \preceq \alpha \cdot \frac{(d_h + \lambda C / b)^2}{\sigma^2} \] where \(d_h\) is the diameter of the domain of \(h\). 3. **Assume that each \(\nabla f_i\) is Lipschitz Continuous**: In this case, the boundary has the form: \[ D_\alpha(X_T \| X'_T) \preceq \alpha \cdot \frac{T}{\ell} \left( \frac{L^2 \ell}{\lambda} \sum_{i = 1}^\ell \frac{L_i^2}{\lambda} \right) \left( \frac{\lambda C}{b} \right)^2 \] where \(L\) and \(L_i\) are certain finite values respectively. ### Mathematical Techniques 1. **Weak Convexity**: This paper generalizes the existing analysis in convex and twice - differentiable settings to weakly convex (possibly non - smooth) composite settings. By using the recently developed optimal transport techniques and the curvature characterization of non - convex functions, this paper shows how to make the boundaries converge by appropriately scaling the weakly convex parameter.