High-Probability Bounds for Stochastic Optimization and Variational Inequalities: the Case of Unbounded Variance

Abdurakhmon Sadiev,Marina Danilova,Eduard Gorbunov,Samuel Horváth,Gauthier Gidel,Pavel Dvurechensky,Alexander Gasnikov,Peter Richtárik
2023-07-18
Abstract:During recent years the interest of optimization and machine learning communities in high-probability convergence of stochastic optimization methods has been growing. One of the main reasons for this is that high-probability complexity bounds are more accurate and less studied than in-expectation ones. However, SOTA high-probability non-asymptotic convergence results are derived under strong assumptions such as the boundedness of the gradient noise variance or of the objective's gradient itself. In this paper, we propose several algorithms with high-probability convergence results under less restrictive assumptions. In particular, we derive new high-probability convergence results under the assumption that the gradient/operator noise has bounded central $\alpha$-th moment for $\alpha \in (1,2]$ in the following setups: (i) smooth non-convex / Polyak-Lojasiewicz / convex / strongly convex / quasi-strongly convex minimization problems, (ii) Lipschitz / star-cocoercive and monotone / quasi-strongly monotone variational inequalities. These results justify the usage of the considered methods for solving problems that do not fit standard functional classes studied in stochastic optimization.
Optimization and Control,Machine Learning
What problem does this paper attempt to address?
The paper addressed in the text aims to solve issues related to the convergence of stochastic optimization methods under relaxed assumptions compared to those typically required in the literature. Specifically, the authors focus on developing algorithms with high-probability convergence guarantees that can handle scenarios where the variance of the gradient noise is unbounded, which is a significant departure from the common assumption of bounded variance. ### Key Contributions 1. **Relaxed Assumptions**: The paper proposes algorithms that can handle cases where the variance of the gradient noise is unbounded, specifically when the noise has a bounded central α-th moment for α ∈ (1, 2]. This allows for handling heavy-tailed distributions of the noise, which are more realistic in many practical applications. 2. **Extensive Coverage of Problems**: - **Minimization Problems**: Smooth non-convex, Polyak-Łojasiewicz (PL), convex, strongly convex, and quasi-strongly convex problems. - **Variational Inequalities**: Lipschitz, star-cocoercive, monotone, and quasi-strongly monotone variational inequalities. 3. **New High-Probability Convergence Results**: - For clipped-SGD and clipped-SSTM in various optimization settings, including convex, strongly convex, PL, and quasi-strongly convex minimization problems. - For clipped-SEG and clipped-SGDA in solving variational inequalities under different structured non-monotonicity assumptions. 4. **Optimality of Results**: