Abstract:No-regret algorithms are popular for learning Nash equilibrium (NE) in two-player zero-sum normal-form games (NFGs) and extensive-form games (EFGs). Many recent works consider the last-iterate convergence no-regret algorithms. Among them, the two most famous algorithms are Optimistic Gradient Descent Ascent (OGDA) and Optimistic Multiplicative Weight Update (OMWU). However, OGDA has high per-iteration complexity. OMWU exhibits a lower per-iteration complexity but poorer empirical performance, and its convergence holds only when NE is unique. Recent works propose a Reward Transformation (RT) framework for MWU, which removes the uniqueness condition and achieves competitive performance with OMWU. Unfortunately, RT-based algorithms perform worse than OGDA under the same number of iterations, and their convergence guarantee is based on the continuous-time feedback assumption, which does not hold in most scenarios. To address these issues, we provide a closer analysis of the RT framework, which holds for both continuous and discrete-time feedback. We demonstrate that the essence of the RT framework is to transform the problem of learning NE in the original game into a series of strongly convex-concave optimization problems (SCCPs). We show that the bottleneck of RT-based algorithms is the speed of solving SCCPs. To improve the their empirical performance, we design a novel transformation method to enable the SCCPs can be solved by Regret Matching+ (RM+), a no-regret algorithm with better empirical performance, resulting in Reward Transformation RM+ (RTRM+). RTRM+ enjoys last-iterate convergence under the discrete-time feedback setting. Using the counterfactual regret decomposition framework, we propose Reward Transformation CFR+ (RTCFR+) to extend RTRM+ to EFGs. Experimental results show that our algorithms significantly outperform existing last-iterate convergence algorithms and RM+ (CFR+).

Faster Optimistic Online Mirror Descent for Extensive-Form Games

Equivalence Analysis between Counterfactual Regret Minimization and Online Mirror Descent

Efficient Phi-Regret Minimization in Extensive-Form Games via Online Mirror Descent

Optimistic Online Mirror Descent for Bridging Stochastic and Adversarial Online Convex Optimization

Minimizing Weighted Counterfactual Regret with Optimistic Online Mirror Descent

Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback

Neural Future-Dependent Online Mirror Descent.

Riemannian Optimistic Algorithms

Banker Online Mirror Descent

EFDO: Solving Extensive-Form Games Based On Double Oracle

Efficient Last-iterate Convergence Algorithms in Solving Games

Online Markov Decision Processes with Non-Oblivious Strategic Adversary

Regret-Minimizing Double Oracle for Extensive-Form Games

Adaptive Online Learning in Dynamic Environments.

Optimal Dynamic Regret for Online Convex Optimization with Squared L2 Norm Switching Cost

Robust No-Regret Learning in Min-Max Stackelberg Games

Doubly Optimal No-Regret Learning in Monotone Games

Online Double Oracle

Sample-Efficient Regret-Minimizing Double Oracle in Extensive-Form Games

Online Sequential Decision-Making with Unknown Delays

Adaptively Perturbed Mirror Descent for Learning in Games