Abstract:In the context of finite sums minimization, variance reduction techniques are widely used to improve the performance of state-of-the-art stochastic gradient methods. Their practical impact is clear, as well as their theoretical properties. Stochastic proximal point algorithms have been studied as an alternative to stochastic gradient algorithms since they are more stable with respect to the choice of the step size. However, their variance-reduced versions are not as well studied as the gradient ones. In this work, we propose the first unified study of variance reduction techniques for stochastic proximal point algorithms. We introduce a generic stochastic proximal-based algorithm that can be specified to give the proximal version of SVRG, SAGA, and some of their variants. For this algorithm, in the smooth setting, we provide several convergence rates for the iterates and the objective function values, which are faster than those of the vanilla stochastic proximal point algorithm. More specifically, for convex functions, we prove a sublinear convergence rate of $O(1/k)$. In addition, under the Polyak-Łojasiewicz (PL) condition, we obtain linear convergence rates. Finally, our numerical experiments demonstrate the advantages of the proximal variance reduction methods over their gradient counterparts in terms of the stability with respect to the choice of the step size in most cases, especially for difficult problems.

What problem does this paper attempt to address?

This paper aims to apply the variance reduction techniques in finite - sum optimization problems to the Stochastic Proximal Point Algorithm (SPPA). Specifically, the paper proposes a unified variance reduction technique framework to improve the performance of SPPA. This framework can generate proximal - version algorithms similar to SVRG (Stochastic Variance Reduced Gradient), SAGA (Stochastic Average Gradient Algorithm) and their variants. Through this method, the authors not only improve the convergence speed of the algorithm in the smooth setting, but also prove the sub - linear convergence rate $O(1/k)$ for convex functions and the linear convergence rate under the Polyak - Łojasiewicz (PL) condition. In addition, numerical experiments show that the proposed proximal variance reduction method is more stable than the gradient method in most cases, especially when dealing with difficult problems, and is less sensitive to the choice of step size. ### Background of the Paper and Problem Definition In machine learning and deep learning, a common optimization problem is Empirical Risk Minimization (ERM), whose goal is to minimize the objective function in the following form: \[ \min_{x \in H} F(x)=\frac{1}{n} \sum_{i = 1}^{n} f_i(x), \] where $H$ is a separable Hilbert space, $f_i: H\rightarrow\mathbb{R}$ is a loss function, $n$ is the number of data points, and $x\in H$ contains model parameters. Due to the existence of large - scale data sets, using the traditional Gradient Descent (GD) for optimization is very expensive in terms of both computation and storage. Therefore, in recent years, various variants of Stochastic Gradient Descent (SGD) have been proposed to solve this problem. However, the convergence speed of SGD is usually slower than that of deterministic GD and is very sensitive to the choice of step size. ### Stochastic Proximal Point Algorithm (SPPA) As an alternative, the Stochastic Proximal Point Algorithm (SPPA) has attracted attention because of its stability in choosing step sizes. SPPA uses the proximal operator of each $f_i$ instead of the gradient for iterative updates. However, there are relatively few studies on variance reduction techniques for SPPA. ### Variance Reduction Techniques Variance reduction techniques (such as SVRG and SAGA) enable the algorithm to recover the convergence speed of standard GD by reducing the variance of stochastic gradient estimates. These techniques have been widely studied in SGD, but their application to SPPA is relatively limited. ### Contributions of the Paper 1. **Unified Variance Reduction Technique**: The paper proposes a unified variance reduction technique framework applicable to SPPA. This framework can generate proximal - version algorithms similar to SVRG, SAGA and L - SVRG. 2. **Improved Convergence Rates**: In the smooth setting, the paper proves improved convergence rates. For convex functions, it proves a sub - linear convergence rate of $O(1/k)$; under the PL condition, it proves a linear convergence rate. 3. **Numerical Experiments**: The experimental results show that the proposed proximal variance reduction method is more stable than the gradient method in most cases, especially when dealing with difficult problems, and is less sensitive to the choice of step size. ### Conclusion This paper significantly improves the performance of SPPA by proposing a unified variance reduction technique framework. This not only expands the application range of variance reduction techniques but also provides a new tool for dealing with large - scale optimization problems.

Variance reduction techniques for stochastic proximal point algorithms

Accelerated Stochastic ADMM with Variance Reduction

Variance-Reduced Proximal Stochastic Gradient Descent for Non-convex Composite optimization.

A Semismooth Newton Stochastic Proximal Point Algorithm with Variance Reduction

Sampling and Update Frequencies in Proximal Variance-Reduced Stochastic Gradient Methods

A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization.

Linear Convergence of Variance-Reduced Stochastic Gradient without Strong Convexity

Variance Reduction and Low Sample Complexity in Stochastic Optimization via Proximal Point Method

Stochastic Sub-Sampled Newton Method with Variance Reduction

Stochastic Nested Variance Reduction for Nonconvex Optimization

Larger is Better: The Effect of Learning Rates Enjoyed by Stochastic Optimization with Progressive Variance Reduction

Distributed Stochastic Gradient Tracking Algorithm with Variance Reduction for Non-Convex Optimization

Asynchronous Stochastic Proximal Optimization Algorithms with Variance Reduction

Local Convergence Properties of SAGA/Prox-SVRG and Acceleration.

VR-SGD: A Simple Stochastic Variance Reduction Method for Machine Learning

New nonasymptotic convergence rates of stochastic proximal pointalgorithm for convex optimization problems

Adaptive Proximal Average Based Variance Reducing Stochastic Methods for Optimization with Composite Regularization.

Simple and Optimal Stochastic Gradient Methods for Nonsmooth Nonconvex Optimization

Local Convergence Properties of SAGA/Prox-SVRG and Acceleration