High Probability and Risk-Averse Guarantees for a Stochastic Accelerated Primal-Dual Method

Yassine Laguel,Necdet Serhat Aybat,Mert Gürbüzbalaban
2023-07-14
Abstract:We consider stochastic strongly-convex-strongly-concave (SCSC) saddle point (SP) problems which frequently arise in applications ranging from distributionally robust learning to game theory and fairness in machine learning. We focus on the recently developed stochastic accelerated primal-dual algorithm (SAPD), which admits optimal complexity in several settings as an accelerated algorithm. We provide high probability guarantees for convergence to a neighborhood of the saddle point that reflects accelerated convergence behavior. We also provide an analytical formula for the limiting covariance matrix of the iterates for a class of stochastic SCSC quadratic problems where the gradient noise is additive and Gaussian. This allows us to develop lower bounds for this class of quadratic problems which show that our analysis is tight in terms of the high probability bound dependency to the parameters. We also provide a risk-averse convergence analysis characterizing the ``Conditional Value at Risk'', the ``Entropic Value at Risk'', and the $\chi^2$-divergence of the distance to the saddle point, highlighting the trade-offs between the bias and the risk associated with an approximate solution obtained by terminating the algorithm at any iteration.
Optimization and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to provide an Accelerated Stochastic Primal - Dual algorithm (SAPD) with high - probability and risk - aversion guarantees in the stochastic strongly convex - strongly concave saddle - point problems. Specifically, the paper focuses on how to effectively solve such problems when the partial gradients \(\nabla_x \Phi\) and \(\nabla_y \Phi\) cannot be obtained deterministically but only through their stochastic estimates \(r\nabla_x \Phi\) and \(r\nabla_y \Phi\). ### Main problems 1. **High - probability convergence guarantees**: The paper provides guarantees for the algorithm to converge to the neighborhood of the saddle point with high probability, and these guarantees reflect the accelerated convergence behavior. Specifically, the paper analyzes the probability of the algorithm reaching a certain precision level during the iteration process and gives the high - probability bounds for converging to the neighborhood of the saddle point. 2. **Risk - aversion analysis**: In addition to the high - probability convergence guarantees, the paper also provides convergence analyses under several risk measures, including Conditional Value - at - Risk (CVaR), Entropic Value - at - Risk (EVaR) and \(\chi^2\)-divergence. These risk measures are used to characterize the deviation and risk between the approximate solution obtained when the algorithm terminates and the true solution. ### Mathematical description The strongly convex - strongly concave saddle - point problem considered in the paper has the following form: \[ \min_{x \in X} \max_{y \in Y} L(x, y) \equiv f(x) + \Phi(x, y) - g(y), \] where \(X\) and \(Y\) are finite - dimensional Euclidean spaces, \(f: X \to \mathbb{R} \cup \{+\infty\}\) and \(g: Y \to \mathbb{R} \cup \{+\infty\}\) are closed convex functions, \(\Phi: X \times Y \to \mathbb{R}\) is a smooth convex - concave function such that \(L(x, y)\) is strongly convex in \(x\) and strongly concave in \(y\). ### Key contributions 1. **High - probability convergence**: The paper is the first to provide an accelerated algorithm with high - probability guarantees, and these guarantees reflect the characteristic that the initialization deviation decays linearly with the condition number. 2. **Risk measures**: The paper provides risk guarantees in finite time, including CVaR, EVaR and \(\chi^2\)-divergence, and these measures are used to quantify the risk of the approximate solution. 3. **Theoretical analysis**: The paper proves these results by constructing a new Lyapunov function \(V_n\) which has good contraction properties. ### Formula examples - **Weighted squared distance**: \[ D_n \equiv \frac{1}{2\tau} \|x_n - x^*\|^2 + \frac{1}{2\left(\frac{1}{\sigma} - \alpha\right)} \|y_n - y^*\|^2 \] - **Convergence of the expected weighted squared distance**: \[ \mathbb{E}[D_n] \leq \rho^n D_{\tau,\sigma} + \frac{\rho}{1 - \rho} \left( \frac{\tau}{1 + \tau \mu_x} \Xi_x^{\tau,\sigma,\theta} \nu_x^2 + \frac{\sigma}{1 + \sigma \mu_y} \Xi_y^{\tau,\sigma,\theta} \nu_y^2 \right) \] where \(D_{\tau,\sigma} = \frac{1}{2\tau} \|x_0 - x^*\|^2 + \frac{1}{2\sigma} \|y_0 - y^*\|^2\) represents the initial deviation, \(\Xi_x^{\tau,\sigma,\theta}\) and \(\Xi_y^{\tau,\sigma,\theta}\)