A Unified Theory of Stochastic Proximal Point Methods without Smoothness

Peter Richtárik,Abdurakhmon Sadiev,Yury Demidovich
2024-05-25
Abstract:This paper presents a comprehensive analysis of a broad range of variations of the stochastic proximal point method (SPPM). Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning, a trait not shared by the dominant stochastic gradient descent (SGD) algorithm. A framework of assumptions that we introduce encompasses methods employing techniques such as variance reduction and arbitrary sampling. A cornerstone of our general theoretical approach is a parametric assumption on the iterates, correction and control vectors. We establish a single theorem that ensures linear convergence under this assumption and the $\mu$-strong convexity of the loss function, and without the need to invoke smoothness. This integral theorem reinstates best known complexity and convergence guarantees for several existing methods which demonstrates the robustness of our approach. We expand our study by developing three new variants of SPPM, and through numerical experiments we elucidate various properties inherent to them.
Optimization and Control,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is related to the theoretical analysis of the Stochastic Proximal Point Method (SPPM) and the development of new variants. Specifically, the paper aims to: 1. **Provide a unified theoretical framework**: The paper proposes a comprehensive set of assumptions that can cover a wide range of SPPM variants, including those using variance reduction techniques, arbitrary sampling methods, etc. These assumptions are applicable not only to the standard SPPM but also to its various variants. 2. **Without smoothness assumptions**: In many existing optimization algorithms, it is usually necessary to assume that the objective function is smooth (i.e., the gradient is bounded). However, an important contribution of this paper is that it can prove linear convergence without relying on smoothness assumptions. This enables SPPM and its variants to be applied to a wider range of problems. 3. **Develop new SPPM variants**: The paper proposes three new SPPM variants and demonstrates their performance characteristics through numerical experiments. These new methods include: - **SPPM - NS**: Using a non - uniform sampling strategy, which can improve the convergence speed and modify the neighborhood of the solution. - **SPPM - AS**: Based on an arbitrary sampling framework, it can perform a unified convergence analysis for different sampling and mini - batch strategies. - **SPPM* and SPPM - GC**: These two methods combine the traditional SPPM and variance reduction techniques, providing more powerful optimization capabilities. 4. **Verify theoretical results**: Through numerical experiments, the paper verifies the validity of the theoretical analysis and shows the performance of different methods in practical problems. In particular, the experimental results show that the SPPM variants using variance reduction techniques can converge to the optimal solution more quickly and have a smaller neighborhood radius. In summary, the main objective of this paper is to provide a unified theoretical framework, so that SPPM and its variants can obtain strict theoretical guarantees under more extensive conditions, and further expand the application range of SPPM through the introduction of new methods and experimental verification.