Fundamental Benefit of Alternating Updates in Minimax Optimization

Jaewook Lee,Hanseul Cho,Chulhee Yun
2024-07-15
Abstract:The Gradient Descent-Ascent (GDA) algorithm, designed to solve minimax optimization problems, takes the descent and ascent steps either simultaneously (Sim-GDA) or alternately (Alt-GDA). While Alt-GDA is commonly observed to converge faster, the performance gap between the two is not yet well understood theoretically, especially in terms of global convergence rates. To address this theory-practice gap, we present fine-grained convergence analyses of both algorithms for strongly-convex-strongly-concave and Lipschitz-gradient objectives. Our new iteration complexity upper bound of Alt-GDA is strictly smaller than the lower bound of Sim-GDA; i.e., Alt-GDA is provably faster. Moreover, we propose Alternating-Extrapolation GDA (Alex-GDA), a general algorithmic framework that subsumes Sim-GDA and Alt-GDA, for which the main idea is to alternately take gradients from extrapolations of the iterates. We show that Alex-GDA satisfies a smaller iteration complexity bound, identical to that of the Extra-gradient method, while requiring less gradient computations. We also prove that Alex-GDA enjoys linear convergence for bilinear problems, for which both Sim-GDA and Alt-GDA fail to converge at all.
Optimization and Control,Machine Learning
What problem does this paper attempt to address?
The paper "Fundamental Benefit of Alternating Updates in Minimax Optimization" addresses the issue of understanding the theoretical performance gap between simultaneous and alternating update strategies in gradient descent-ascent (GDA) algorithms for solving minimax optimization problems. The primary focus is on two specific GDA variants: Sim-GDA (simultaneous updates) and Alt-GDA (alternating updates). ### Key Points 1. **Problem Statement**: - The paper deals with minimax problems of the form \(\min_x \max_y f(x,y)\), which are common in various fields such as machine learning, particularly in generative adversarial networks (GANs), adversarial training, reinforcement learning, and AUC maximization. 2. **Gradient Descent-Ascent (GDA)**: - GDA is a basic algorithm for solving minimax problems. It updates \(x\) and \(y\) either simultaneously (Sim-GDA) or alternately (Alt-GDA). 3. **Objective**: - The main objective is to understand and quantify the difference in convergence rates between Sim-GDA and Alt-GDA, particularly in terms of global convergence rates. 4. **Contributions**: - The authors provide a fine-grained convergence analysis of Sim-GDA and Alt-GDA for strongly-convex-strongly-concave (SCSC) and Lipschitz-gradient objectives. - They prove that Alt-GDA has a strictly better iteration complexity upper bound than Sim-GDA. - They introduce a new algorithm, Alternating-Extrapolation GDA (Alex-GDA), which achieves an even better iteration complexity upper bound.