Abstract:The optimistic gradient method has seen increasing popularity for solving convex-concave saddle point problems. To analyze its iteration complexity, a recent work [<a class="link-https" data-arxiv-id="1906.01115" href="https://arxiv.org/abs/1906.01115">arXiv:1906.01115</a>] proposed an interesting perspective that interprets this method as an approximation to the proximal point method. In this paper, we follow this approach and distill the underlying idea of optimism to propose a generalized optimistic method, which includes the optimistic gradient method as a special case. Our general framework can handle constrained saddle point problems with composite objective functions and can work with arbitrary norms using Bregman distances. Moreover, we develop a backtracking line search scheme to select the step sizes without knowledge of the smoothness coefficients. We instantiate our method with first-, second- and higher-order oracles and give best-known global iteration complexity bounds. For our first-order method, we show that the averaged iterates converge at a rate of $O(1/N)$ when the objective function is convex-concave, and it achieves linear convergence when the objective is strongly-convex-strongly-concave. For our second- and higher-order methods, under the additional assumption that the distance-generating function has Lipschitz gradient, we prove a complexity bound of $O(1/\epsilon^\frac{2}{p+1})$ in the convex-concave setting and a complexity bound of $O((L_pD^\frac{p-1}{2}/\mu)^\frac{2}{p+1}+\log\log\frac{1}{\epsilon})$ in the strongly-convex-strongly-concave setting, where $L_p$ ($p\geq 2$) is the Lipschitz constant of the $p$-th-order derivative, $\mu$ is the strong convexity parameter, and $D$ is the initial Bregman distance to the saddle point. Moreover, our line search scheme provably only requires a constant number of calls to a subproblem solver per iteration on average, making our first- and second-order methods particularly amenable to implementation.

First-order methods almost always avoid strict saddle points

Riemannian stochastic optimization methods avoid strict saddle points

Stability of first-order methods in tame optimization

Inertial Newton Algorithms Avoiding Strict Saddle Points

Dealing with unbounded gradients in stochastic saddle-point optimization

Almost Sure Saddle Avoidance of Stochastic Gradient Methods without the Bounded Gradient Assumption

Avoiding strict saddle points of nonconvex regularized problems

First-order Stochastic Algorithms for Escaping From Saddle Points in Almost Linear Time

Randomized First-Order Methods for Saddle Point Optimization

A Deterministic Gradient-Based Approach to Avoid Saddle Points

Block Coordinate Descent Almost Surely Converges to a Stationary Point Satisfying the Second-order Necessary Condition

Exit Time Analysis for Approximations of Gradient Descent Trajectories Around Saddle Points

Stochastic Subgradient Descent Escapes Active Strict Saddles on Weakly Convex Functions

Efficient First Order Method for Saddle Point Problems with Higher Order Smoothness

Gradient descent provably escapes saddle points in the training of shallow ReLU networks

Generalized Optimistic Methods for Convex-Concave Saddle Point Problems

Global stability of first-order methods for coercive tame functions

First-Order Methods for Nonsmooth Nonconvex Functional Constrained Optimization with or without Slater Points

Bound Analysis of Natural Gradient Descent in Stochastic Optimization Setting

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization