Abstract:We consider stochastic optimization problems with heavy-tailed noise with structured density. For such problems, we show that it is possible to get faster rates of convergence than $\mathcal{O}(K^{-2(\alpha - 1)/\alpha})$, when the stochastic gradients have finite moments of order $\alpha \in (1, 2]$. In particular, our analysis allows the noise norm to have an unbounded expectation. To achieve these results, we stabilize stochastic gradients, using smoothed medians of means. We prove that the resulting estimates have negligible bias and controllable variance. This allows us to carefully incorporate them into clipped-SGD and clipped-SSTM and derive new high-probability complexity bounds in the considered setup.

What problem does this paper attempt to address?

This paper aims to solve the bottleneck of convergence rate in stochastic optimization problems with heavy - tailed noise. Specifically, when the stochastic gradient has a finite $\alpha$-th moment ($\alpha\in(1, 2]$), the existing methods can usually only achieve a convergence rate of $O(K^{-2(\alpha - 1)/\alpha})$, where $K$ is the number of iterations. However, this convergence rate will slow down significantly when $\alpha$ is close to 1, and even cannot guarantee convergence when $\alpha = 1$. To overcome this problem, the author proposes a new method to stabilize the stochastic gradient by using the smoothed median of means. This method can generate estimates with negligible bias and controllable variance, so it can be effectively integrated into clipped - stochastic gradient descent (clipped - SGD) and clipped - accelerated stochastic gradient descent (clipped - SSTM), and then derive new high - probability complexity bounds. ### Main Contributions 1. **New Assumption Conditions**: The author introduces a new assumption condition (Assumption 2.1) that describes the structure of the noise, allowing the density of the noise to have a finite $\alpha$-th moment and can include an asymmetric part. 2. **Performance Analysis of Smoothed Median**: The author provides a non - asymptotic performance analysis of the smoothed median, proving that even under heavy - tailed noise, the smoothed median can provide estimates with small bias and controllable variance. 3. **Improved Convergence Rate**: By using the smoothed median, the author achieves a faster convergence rate in clipped - stochastic gradient descent and clipped - accelerated stochastic gradient descent. Specifically, for the smoothed strongly convex problem, the upper bound of the dominant term decays at a rate of $eO(K^{-1})$, which is better than $O(K^{-2(\alpha - 1)/\alpha})$ (when $\alpha < 4/3$). 4. **Symmetric Noise Distribution**: For the symmetric noise distribution, the author obtains a convergence rate that matches the latest results under the bounded variance assumption (up to a logarithmic factor). ### Paper Structure - **Section 2**: Introduce symbols and problem settings. - **Section 3**: Review related work. - **Section 4**: Describe the smoothed median and its properties. - **Section 5**: Present the main results, including the convergence analysis of clipped - stochastic gradient descent and clipped - accelerated stochastic gradient descent. - **Section 6**: Verify the performance of the proposed algorithm through experiments. ### Conclusion By introducing new assumption conditions and using the smoothed median technique, the author successfully breaks through the bottleneck of the convergence rate of heavy - tailed noise in stochastic optimization problems, providing new theoretical support and practical methods for dealing with optimization problems with heavy - tailed noise.

Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems

Stochastic Nonsmooth Convex Optimization with Heavy-Tailed Noises: High-Probability Bound, In-Expectation Rate and Initial Distance Adaptation

High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise

High-Probability Complexity Bounds for Non-smooth Stochastic Convex Optimization with Heavy-Tailed Noise

Nonlinear Stochastic Gradient Descent and Heavy-tailed Noise: A Unified Framework and High-probability Guarantees

High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise

Optimal Rates for the Last Iterate of the Stochastic subgradient Method under Heavy-Tails

Algorithms with Gradient Clipping for Stochastic Optimization with Heavy-Tailed Noise

High Probability Bounds for Stochastic Subgradient Schemes with Heavy Tailed Noise

Multiplicative noise and heavy tails in stochastic optimization

High-Probability Bound for Non-Smooth Non-Convex Stochastic Optimization with Heavy Tails

Gradient-Free Methods for Non-Smooth Convex Stochastic Optimization with Heavy-Tailed Noise on Convex Compact

High-Probability Convergence for Composite and Distributed Stochastic Minimization and Variational Inequalities with Heavy-Tailed Noise

Stochastic Optimization with Non-stationary Noise: the Power of Moment Estimation

Stochastic Optimization with Non-stationary Noise

Distributed Stochastic Strongly Convex Optimization under Heavy-Tailed Noises

First Order Stochastic Optimization with Oblivious Noise

Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

Towards Noise-adaptive, Problem-adaptive (Accelerated) Stochastic Gradient Descent

Median Clipping for Zeroth-order Non-Smooth Convex Optimization and Multi-Armed Bandit Problem with Heavy-tailed Symmetric Noise

Tradeoffs between convergence rate and noise amplification for momentum-based accelerated optimization algorithms