Abstract:No-regret algorithms are popular for learning Nash equilibrium (NE) in two-player zero-sum normal-form games (NFGs) and extensive-form games (EFGs). Many recent works consider the last-iterate convergence no-regret algorithms. Among them, the two most famous algorithms are Optimistic Gradient Descent Ascent (OGDA) and Optimistic Multiplicative Weight Update (OMWU). However, OGDA has high per-iteration complexity. OMWU exhibits a lower per-iteration complexity but poorer empirical performance, and its convergence holds only when NE is unique. Recent works propose a Reward Transformation (RT) framework for MWU, which removes the uniqueness condition and achieves competitive performance with OMWU. Unfortunately, RT-based algorithms perform worse than OGDA under the same number of iterations, and their convergence guarantee is based on the continuous-time feedback assumption, which does not hold in most scenarios. To address these issues, we provide a closer analysis of the RT framework, which holds for both continuous and discrete-time feedback. We demonstrate that the essence of the RT framework is to transform the problem of learning NE in the original game into a series of strongly convex-concave optimization problems (SCCPs). We show that the bottleneck of RT-based algorithms is the speed of solving SCCPs. To improve the their empirical performance, we design a novel transformation method to enable the SCCPs can be solved by Regret Matching+ (RM+), a no-regret algorithm with better empirical performance, resulting in Reward Transformation RM+ (RTRM+). RTRM+ enjoys last-iterate convergence under the discrete-time feedback setting. Using the counterfactual regret decomposition framework, we propose Reward Transformation CFR+ (RTCFR+) to extend RTRM+ to EFGs. Experimental results show that our algorithms significantly outperform existing last-iterate convergence algorithms and RM+ (CFR+).

Exploiting a No-Regret Opponent in Repeated Zero-Sum Games

No-Regret Learning in Time-Varying Zero-Sum Games

RM-FSP: Regret Minimization Optimizes Neural Fictitious Self-Play

Robust No-Regret Learning in Min-Max Stackelberg Games

Is Learning in Games Good for the Learners?

Monte Carlo Neural Fictitious Self-Play: Achieve Approximate Nash equilibrium of Imperfect-Information Games.

An efficient model‐free adaptive optimal control of continuous‐time nonlinear non‐zero‐sum games based on integral reinforcement learning with exploration

Optimize Neural Fictitious Self-Play in Regret Minimization Thinking

Safe Opponent-Exploitation Subgame Refinement

Efficient Methods for Non-stationary Online Learning

In-Context Exploiter for Extensive-Form Games

Efficient Last-iterate Convergence Algorithms in Solving Games

No-Regret Learning in Network Stochastic Zero-Sum Games

Online Markov Decision Processes with Non-Oblivious Strategic Adversary

Doubly Optimal No-Regret Learning in Monotone Games

L2E: Learning to Exploit Your Opponent

Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback

No-Regret Learning for Stackelberg Equilibrium Computation in Newsvendor Pricing Games

Maximizing utility in multi-agent environments by anticipating the behavior of other learners

Neural Auto-Curricula in Two-Player Zero-Sum Games.

On the Convergence of No-Regret Learning Dynamics in Time-Varying Games