Generalized Optimistic Methods for Convex-Concave Saddle Point Problems

Ruichen Jiang,Aryan Mokhtari
2024-01-10
Abstract:The optimistic gradient method has seen increasing popularity for solving convex-concave saddle point problems. To analyze its iteration complexity, a recent work [<a class="link-https" data-arxiv-id="1906.01115" href="https://arxiv.org/abs/1906.01115">arXiv:1906.01115</a>] proposed an interesting perspective that interprets this method as an approximation to the proximal point method. In this paper, we follow this approach and distill the underlying idea of optimism to propose a generalized optimistic method, which includes the optimistic gradient method as a special case. Our general framework can handle constrained saddle point problems with composite objective functions and can work with arbitrary norms using Bregman distances. Moreover, we develop a backtracking line search scheme to select the step sizes without knowledge of the smoothness coefficients. We instantiate our method with first-, second- and higher-order oracles and give best-known global iteration complexity bounds. For our first-order method, we show that the averaged iterates converge at a rate of $O(1/N)$ when the objective function is convex-concave, and it achieves linear convergence when the objective is strongly-convex-strongly-concave. For our second- and higher-order methods, under the additional assumption that the distance-generating function has Lipschitz gradient, we prove a complexity bound of $O(1/\epsilon^\frac{2}{p+1})$ in the convex-concave setting and a complexity bound of $O((L_pD^\frac{p-1}{2}/\mu)^\frac{2}{p+1}+\log\log\frac{1}{\epsilon})$ in the strongly-convex-strongly-concave setting, where $L_p$ ($p\geq 2$) is the Lipschitz constant of the $p$-th-order derivative, $\mu$ is the strong convexity parameter, and $D$ is the initial Bregman distance to the saddle point. Moreover, our line search scheme provably only requires a constant number of calls to a subproblem solver per iteration on average, making our first- and second-order methods particularly amenable to implementation.
Optimization and Control,Machine Learning
What problem does this paper attempt to address?
This paper attempts to address convex-concave saddle point problems, also known as minimax optimization problems. Specifically, the paper focuses on saddle point problems with composite structured objective functions: \[ \min_{x \in X} \max_{y \in Y} \ell(x, y) := f(x, y) + h_1(x) - h_2(y) \] where \(X \subset \mathbb{R}^m\) and \(Y \subset \mathbb{R}^n\) are non-empty closed convex sets, \(h_1: \mathbb{R}^m \to (-\infty, +\infty]\) and \(h_2: \mathbb{R}^n \to (-\infty, +\infty]\) are proper closed convex functions, and \(f\) is a smooth function defined on an open set containing \(X \times Y\). Additionally, it is assumed that \(f\) is convex in \(x\) and concave in \(y\). The main contributions of the paper include: 1. **Proposing a generalized optimistic method**, which views the optimistic gradient method as an approximation of the proximal point method and extends this idea to handle constrained saddle point problems with composite objective functions. 2. **Developing a backtracking line search scheme** for selecting step sizes without relying on the knowledge of the smoothness coefficient. 3. **Providing optimistic methods based on different order information** (first-order, second-order, and higher-order optimistic methods) and presenting the best-known global iteration complexity bounds for these methods in both convex-concave and strongly convex-strongly concave settings. Through these methods, the paper not only addresses unconstrained smooth saddle point problems but also extends to constrained problems with composite terms, achieving theoretical results that match the existing optimal upper bounds.