Private Stochastic Convex Optimization with Heavy Tails: Near-Optimality from Simple Reductions

Hilal Asi,Daogao Liu,Kevin Tian
2024-06-05
Abstract:We study the problem of differentially private stochastic convex optimization (DP-SCO) with heavy-tailed gradients, where we assume a $k^{\text{th}}$-moment bound on the Lipschitz constants of sample functions rather than a uniform bound. We propose a new reduction-based approach that enables us to obtain the first optimal rates (up to logarithmic factors) in the heavy-tailed setting, achieving error $G_2 \cdot \frac 1 {\sqrt n} + G_k \cdot (\frac{\sqrt d}{n\epsilon})^{1 - \frac 1 k}$ under $(\epsilon, \delta)$-approximate differential privacy, up to a mild $\textup{polylog}(\frac{1}{\delta})$ factor, where $G_2^2$ and $G_k^k$ are the $2^{\text{nd}}$ and $k^{\text{th}}$ moment bounds on sample Lipschitz constants, nearly-matching a lower bound of [Lowy and Razaviyayn 2023]. We further give a suite of private algorithms in the heavy-tailed setting which improve upon our basic result under additional assumptions, including an optimal algorithm under a known-Lipschitz constant assumption, a near-linear time algorithm for smooth functions, and an optimal linear time algorithm for smooth generalized linear models.
Data Structures and Algorithms,Cryptography and Security,Machine Learning
What problem does this paper attempt to address?
This paper aims to solve the problem of achieving differentially private stochastic convex optimization (DP - SCO) in the case of heavy - tailed gradients. Specifically, the author studies how to design algorithms to achieve the optimal or near - optimal error rate when the Lipschitz constant of the sample function satisfies the k - th moment bound rather than the uniform bound. ### Background of the Paper and Problem Definition **Background**: - **Differentially Private Stochastic Convex Optimization (DP - SCO)**: This is a fundamental problem in statistics and machine learning. The goal is to find a solution that approximately minimizes the overall loss given the sample distribution while ensuring data differential privacy. - **Heavy - Tailed Gradients**: In practical applications, the data distribution may have heavy - tailed characteristics, which means that the Lipschitz constant of the gradient may not have a uniform upper bound but satisfies the bound of a certain high - order moment. **Limitations of Existing Methods**: - Most existing DP - SCO algorithms assume that the gradients are uniformly Lipschitz, that is, the gradients of each sample function have the same Lipschitz bound. This assumption may be too strict in practical applications, especially when the data distribution has heavy - tailed characteristics. - Existing methods for dealing with heavy - tailed gradients have some deficiencies, such as requiring strong conditional assumptions or being unable to achieve the optimal error rate in some cases. ### Main Contributions of the Paper 1. **New Reduction Technique**: - The author proposes a new method based on the reduction technique, which can achieve the optimal or near - optimal error rate for the first time under the condition of heavy - tailed gradients. - Specifically, the algorithm achieves an error of \( G_2\cdot\frac{1}{\sqrt{n}}+G_k\cdot\left(\frac{\sqrt{d}}{n\varepsilon}\right)^{1 - \frac{1}{k}}\) under \((\varepsilon,\delta)\)-differential privacy, where \( G_2\) and \( G_k\) are the 2 - nd and k - th moment bounds of the sample Lipschitz constant, respectively. 2. **Multiple Improved Algorithms**: - The author also proposes a series of improved algorithms in the heavy - tailed gradient setting. These algorithms further improve performance under different assumptions, including: - An optimal algorithm is proposed when the Lipschitz constant is known. - For smooth functions, a near - linear - time algorithm is proposed. - For smooth generalized linear models, an optimal linear - time algorithm is proposed. 3. **New Positioning Framework**: - The author designs a new overall - level positioning framework. By using the bound of the constant success probability, it avoids dependence on high - order moments, thus bypassing the technical obstacles in existing methods. ### Method Overview 1. **Clipped DP - SGD Subroutine**: - The author first designs a clipped DP - SGD subroutine to privately minimize the objective function in the regularized empirical risk minimization (ERM) subproblem. - The key to this subroutine is to ensure privacy by clipping the gradient and introduce noise to protect the sensitivity of the data. 2. **Overall - Level Positioning Strategy**: - The author promotes the weak subproblem solver to a high - probability - of - success solver through geometric aggregation techniques. - This positioning strategy allows the algorithm to find an approximate minimum on the overall loss function, not just the empirical loss. 3. **Theoretical Analysis**: - The author provides a detailed theoretical analysis to prove that the algorithm can achieve the optimal or near - optimal error rate under the condition of heavy - tailed gradients. - By using tools such as Markov's inequality, the author proves that the algorithm can run successfully with high probability. ### Conclusion This paper solves the problem of achieving differentially private stochastic convex optimization in the case of heavy - tailed gradients by proposing new reduction techniques and positioning frameworks. These methods not only achieve the optimal or near - optimal error rate theoretically but also have high practical value in practical applications.