Abstract:For min-max optimization and variational inequalities problems (VIP) encountered in diverse machine learning tasks, Stochastic Extragradient (SEG) and Stochastic Gradient Descent Ascent (SGDA) have emerged as preeminent algorithms. Constant step-size variants of SEG/SGDA have gained popularity, with appealing benefits such as easy tuning and rapid forgiveness of initial conditions, but their convergence behaviors are more complicated even in rudimentary bilinear models. Our work endeavors to elucidate and quantify the probabilistic structures intrinsic to these algorithms. By recasting the constant step-size SEG/SGDA as time-homogeneous Markov Chains, we establish a first-of-its-kind Law of Large Numbers and a Central Limit Theorem, demonstrating that the average iterate is asymptotically normal with a unique invariant distribution for an extensive range of monotone and non-monotone VIPs. Specializing to convex-concave min-max optimization, we characterize the relationship between the step-size and the induced bias with respect to the Von-Neumann's value. Finally, we establish that Richardson-Romberg extrapolation can improve proximity of the average iterate to the global solution for VIPs. Our probabilistic analysis, underpinned by experiments corroborating our theoretical discoveries, harnesses techniques from optimization, Markov chains, and operator theory.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the analysis of the convergence behavior when using the Stochastic Gradient Descent Ascent (SGDA) and Stochastic Extragradient (SEG) algorithms with constant step - sizes in Variational Inequalities (VIPs). Specifically, the paper focuses on the behavior of these algorithms when dealing with VIPs with weakly quasi - strongly monotonic properties. This type of problem encompasses a wide range of non - monotonic and non - convex optimization problems. By regarding these algorithms as time - homogeneous Markov chains, the authors studied their distribution properties, thereby revealing the asymptotic normality of the iterative process and the existence of an invariant distribution. In addition, the paper also explored the relationship between the step - size and the induced bias and proposed the Richardson - Romberg extrapolation method to reduce the bias and improve the performance of the algorithms. ### Main contributions of the paper: 1. **Proved that the iterative process forms a Harris positive recurrent Markov chain**: The authors proved that under appropriate conditions, the iterative processes of SGDA and SEG form a Harris positive recurrent Markov chain, which guarantees the existence of a unique invariant distribution. 2. **Geometric convergence to the invariant distribution**: Further proved that the distribution of the iterative process converges geometrically to the above - mentioned invariant distribution. 3. **Established the law of large numbers and the central limit theorem**: By establishing the law of large numbers and the central limit theorem, it was proved that the average value of the iterative process is asymptotically normally distributed. 4. **Quantified the induced bias**: Analyzed the relationship between the step - size and the induced bias, especially in convex - concave minimax optimization problems, and quantified the impact of the step - size on the Von - Neumann value. 5. **Application of the Richardson - Romberg extrapolation method**: For SGDA applied to quasi - strongly monotonic VIPs, the first - order expansion of the induced bias was derived, and it was shown that the bias can be reduced through the Richardson - Romberg extrapolation method. ### Main technical means: - **Markov chain theory**: Regarding SGDA and SEG as time - homogeneous Markov chains in the continuous state space, using tools such as the Markov chain central limit theorem and Richardson extrapolation method. - **Doeblin condition and Foster - Lyapunov inequality**: Proving the positive recurrence and geometric convergence of the Markov chain through the Doeblin condition and Foster - Lyapunov inequality. - **Stochastic analysis**: Assuming that the noise satisfies certain regularity conditions, studying the convergence behavior of the algorithm through stochastic analysis methods. ### Application background: - **Machine learning tasks**: Such as training Generative Adversarial Networks (GANs), Actor - Critic methods, multi - agent reinforcement learning and robust learning, etc. These problems can all be reduced to variational inequality problems. - **Optimization problems**: Including loss minimization, saddle point problems, etc. These problems are the core problems in the fields of machine learning and optimization. ### Conclusion: Through in - depth theoretical analysis and experimental verification, the paper provides a new perspective for understanding the behavior of constant - step - size SGDA and SEG in variational inequality problems. These results not only enrich the optimization theory but also provide guidance for algorithm selection and parameter tuning in practical applications.

Stochastic Methods in Variational Inequalities: Ergodicity, Bias and Refinements

Stochastic Extragradient with Random Reshuffling: Improved Convergence for Variational Inequalities

Improved Variance Reduction Extragradient Method with Line Search for Stochastic Variational Inequalities.

Modified Stochastic Extragradient Methods for Stochastic Variational Inequality

Universality of AdaGrad Stepsizes for Stochastic Optimization: Inexact Oracle, Acceleration and Variance Reduction

An accelerated stochastic extragradient-like algorithm with new stepsize rules for stochastic variational inequalities

Methods for Solving Variational Inequalities with Markovian Stochasticity

Methods for Optimization Problems with Markovian Stochasticity and Non-Euclidean Geometry

Non-convex Bayesian Learning via Stochastic Gradient Markov Chain Monte Carlo

Convergence Rates for Stochastic Approximation: Biased Noise with Unbounded Variance, and Applications

High-Probability Bounds for Stochastic Optimization and Variational Inequalities: the Case of Unbounded Variance

Stochastic Variance-Reduced Majorization-Minimization Algorithms

A New Algorithm for Stochastic Variational Inequality with an Application

Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson-Romberg Extrapolation

Local AdaGrad-Type Algorithm for Stochastic Convex-Concave Optimization

Demystifying SGD with Doubly Stochastic Gradients

Two-Stage Stochastic Variational Inequalities: Theory, Algorithms and Applications

Generalized Smooth Stochastic Variational Inequalities: Almost Sure Convergence and Convergence Rates

Langevin Dynamics: A Unified Perspective on Optimization via Lyapunov Potentials

Non-asymptotic Analysis of Biased Adaptive Stochastic Approximation

On the Bias-Variance Tradeoff in Stochastic Gradient Methods