Differential Private Stochastic Optimization with Heavy-tailed Data: Towards Optimal Rates

Puning Zhao,Jiafei Wu,Zhe Liu,Chong Wang,Rongfei Fan,Qingming Li

2024-08-19

Abstract:We study convex optimization problems under differential privacy (DP). With heavy-tailed gradients, existing works achieve suboptimal rates. The main obstacle is that existing gradient estimators have suboptimal tail properties, resulting in a superfluous factor of $d$ in the union bound. In this paper, we explore algorithms achieving optimal rates of DP optimization with heavy-tailed gradients. Our first method is a simple clipping approach. Under bounded $p$-th order moments of gradients, with $n$ samples, it achieves $\tilde{O}(\sqrt{d/n}+\sqrt{d}(\sqrt{d}/n\epsilon)^{1-1/p})$ population risk with $\epsilon\leq 1/\sqrt{d}$. We then propose an iterative updating method, which is more complex but achieves this rate for all $\epsilon\leq 1$. The results significantly improve over existing methods. Such improvement relies on a careful treatment of the tail behavior of gradient estimators. Our results match the minimax lower bound in \cite{kamath2022improved}, indicating that the theoretical limit of stochastic convex optimization under DP is achievable.

Machine Learning,Cryptography and Security,Data Structures and Algorithms

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper primarily investigates the problem of stochastic optimization when dealing with heavy-tailed data under the framework of Differential Privacy (DP). Specifically: 1. **Problems with existing methods**: - Existing methods fail to achieve optimal convergence rates when handling heavy-tailed gradients. - The main obstacle is that current gradient estimators are not ideal in high-probability bounds, leading to unnecessary factors in joint bounds. 2. **Research objectives**: - Explore algorithms that can achieve optimal convergence rates under heavy-tailed data. - Propose two methods: a simple clipping method and an iterative updating method. - These methods can significantly improve existing methods and match theoretical lower bounds, indicating that theoretical limits can be achieved. 3. **Specific contributions**: - Simple clipping method: Satisfies DP requirements by clipping gradients and adding noise, suitable for smaller privacy budgets ($ \epsilon \leq 1/\sqrt{d} $). - Iterative updating method: Divides data into multiple groups, estimates each group, and iteratively updates to achieve the optimal convergence rate. Through these methods, the paper addresses the problem of achieving optimal DP stochastic optimization under heavy-tailed data.

Differential Private Stochastic Optimization with Heavy-tailed Data: Towards Optimal Rates

Private Stochastic Convex Optimization with Heavy Tails: Near-Optimality from Simple Reductions

Faster Rates of Private Stochastic Convex Optimization

Private Stochastic Non-Convex Optimization: Adaptive Algorithms and Tighter Generalization Bounds

Differential Privacy in Distributed Optimization with Gradient Tracking

Private Stochastic Convex Optimization with Optimal Rates

User-level Differentially Private Stochastic Convex Optimization: Efficient Algorithms with Optimal Rates

Efficient Private SCO for Heavy-Tailed Data via Averaged Clipping

Faster Algorithms for User-Level Private Stochastic Convex Optimization

Differentially Private Algorithms for the Stochastic Saddle Point Problem with Optimal Rates for the Strong Gap

Output Perturbation for Differentially Private Convex Optimization: Faster and More General

Differentially Private $\ell_1$-norm Linear Regression with Heavy-tailed Data

Differentially Private Optimization with Sparse Gradients

Differentially Private Stochastic Convex Optimization in (Non)-Euclidean Space Revisited

DIFF2: Differential Private Optimization via Gradient Differences for Nonconvex Distributed Learning

Beyond Uniform Lipschitz Condition in Differentially Private Optimization

Tailoring Gradient Methods for Differentially-Private Distributed Optimization

Differentially Private Non-Convex Optimization under the KL Condition with Optimal Rates

Differentially Private Distributed Resource Allocation Via Deviation Tracking

How to Make the Gradients Small Privately: Improved Rates for Differentially Private Non-Convex Optimization