Abstract:We consider stochastic strongly-convex-strongly-concave (SCSC) saddle point (SP) problems which frequently arise in applications ranging from distributionally robust learning to game theory and fairness in machine learning. We focus on the recently developed stochastic accelerated primal-dual algorithm (SAPD), which admits optimal complexity in several settings as an accelerated algorithm. We provide high probability guarantees for convergence to a neighborhood of the saddle point that reflects accelerated convergence behavior. We also provide an analytical formula for the limiting covariance matrix of the iterates for a class of stochastic SCSC quadratic problems where the gradient noise is additive and Gaussian. This allows us to develop lower bounds for this class of quadratic problems which show that our analysis is tight in terms of the high probability bound dependency to the parameters. We also provide a risk-averse convergence analysis characterizing the ``Conditional Value at Risk'', the ``Entropic Value at Risk'', and the $\chi^2$-divergence of the distance to the saddle point, highlighting the trade-offs between the bias and the risk associated with an approximate solution obtained by terminating the algorithm at any iteration.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to provide an Accelerated Stochastic Primal - Dual algorithm (SAPD) with high - probability and risk - aversion guarantees in the stochastic strongly convex - strongly concave saddle - point problems. Specifically, the paper focuses on how to effectively solve such problems when the partial gradients $\nabla_x \Phi$ and $\nabla_y \Phi$ cannot be obtained deterministically but only through their stochastic estimates $r\nabla_x \Phi$ and $r\nabla_y \Phi$. ### Main problems 1. **High - probability convergence guarantees**: The paper provides guarantees for the algorithm to converge to the neighborhood of the saddle point with high probability, and these guarantees reflect the accelerated convergence behavior. Specifically, the paper analyzes the probability of the algorithm reaching a certain precision level during the iteration process and gives the high - probability bounds for converging to the neighborhood of the saddle point. 2. **Risk - aversion analysis**: In addition to the high - probability convergence guarantees, the paper also provides convergence analyses under several risk measures, including Conditional Value - at - Risk (CVaR), Entropic Value - at - Risk (EVaR) and $\chi^2$-divergence. These risk measures are used to characterize the deviation and risk between the approximate solution obtained when the algorithm terminates and the true solution. ### Mathematical description The strongly convex - strongly concave saddle - point problem considered in the paper has the following form: \[ \min_{x \in X} \max_{y \in Y} L(x, y) \equiv f(x) + \Phi(x, y) - g(y), \] where $X$ and $Y$ are finite - dimensional Euclidean spaces, $f: X \to \mathbb{R} \cup \{+\infty\}$ and $g: Y \to \mathbb{R} \cup \{+\infty\}$ are closed convex functions, $\Phi: X \times Y \to \mathbb{R}$ is a smooth convex - concave function such that $L(x, y)$ is strongly convex in $x$ and strongly concave in $y$. ### Key contributions 1. **High - probability convergence**: The paper is the first to provide an accelerated algorithm with high - probability guarantees, and these guarantees reflect the characteristic that the initialization deviation decays linearly with the condition number. 2. **Risk measures**: The paper provides risk guarantees in finite time, including CVaR, EVaR and $\chi^2$-divergence, and these measures are used to quantify the risk of the approximate solution. 3. **Theoretical analysis**: The paper proves these results by constructing a new Lyapunov function $V_n$ which has good contraction properties. ### Formula examples - **Weighted squared distance**: \[ D_n \equiv \frac{1}{2\tau} \|x_n - x^*\|^2 + \frac{1}{2\left(\frac{1}{\sigma} - \alpha\right)} \|y_n - y^*\|^2 \] - **Convergence of the expected weighted squared distance**: \[ \mathbb{E}[D_n] \leq \rho^n D_{\tau,\sigma} + \frac{\rho}{1 - \rho} \left( \frac{\tau}{1 + \tau \mu_x} \Xi_x^{\tau,\sigma,\theta} \nu_x^2 + \frac{\sigma}{1 + \sigma \mu_y} \Xi_y^{\tau,\sigma,\theta} \nu_y^2 \right) \] where $D_{\tau,\sigma} = \frac{1}{2\tau} \|x_0 - x^*\|^2 + \frac{1}{2\sigma} \|y_0 - y^*\|^2$ represents the initial deviation, $\Xi_x^{\tau,\sigma,\theta}$ and $\Xi_y^{\tau,\sigma,\theta}$

High Probability and Risk-Averse Guarantees for a Stochastic Accelerated Primal-Dual Method

Robust Accelerated Primal-Dual Methods for Computing Saddle Points

Accelerated Primal-dual Scheme for a Class of Stochastic Nonconvex-concave Saddle Point Problems

General Procedure to Provide High-Probability Guarantees for Stochastic Saddle Point Problems

SAPD+: An Accelerated Stochastic Method for Nonconvex-Concave Minimax Problems

Optimal Primal-Dual Methods for a Class of Saddle Point Problems

High-probability complexity guarantees for nonconvex minimax problems

Switch and Conquer: Efficient Algorithms By Switching Stochastic Gradient Oracles For Decentralized Saddle Point Problems

A Randomized Block-Coordinate Primal-Dual Method for Large-scale Stochastic Saddle Point Problems

Stochastic Successive Convex Approximation for Non-Convex Constrained Stochastic Optimization

Stochastic Dual Ascent for Solving Linear Systems

A Central Limit Theorem for Algorithmic Estimator of Saddle Point

Accelerated stochastic approximation with state-dependent noise

Convergence Rates for Stochastic Approximation: Biased Noise with Unbounded Variance, and Applications

A Stochastic Variance Reduced Primal Dual Fixed Point Method for Linearly Constrained Separable Optimization

Dynamics of SGD with Stochastic Polyak Stepsizes: Truly Adaptive Variants and Convergence to Exact Solution

Constrained Stochastic Recursive Momentum Successive Convex Approximation

Stochastic Subgradient Descent Escapes Active Strict Saddles on Weakly Convex Functions

Almost-sure convergence of iterates and multipliers in stochastic sequential quadratic optimization