Abstract:No-regret algorithms are popular for learning Nash equilibrium (NE) in two-player zero-sum normal-form games (NFGs) and extensive-form games (EFGs). Many recent works consider the last-iterate convergence no-regret algorithms. Among them, the two most famous algorithms are Optimistic Gradient Descent Ascent (OGDA) and Optimistic Multiplicative Weight Update (OMWU). However, OGDA has high per-iteration complexity. OMWU exhibits a lower per-iteration complexity but poorer empirical performance, and its convergence holds only when NE is unique. Recent works propose a Reward Transformation (RT) framework for MWU, which removes the uniqueness condition and achieves competitive performance with OMWU. Unfortunately, RT-based algorithms perform worse than OGDA under the same number of iterations, and their convergence guarantee is based on the continuous-time feedback assumption, which does not hold in most scenarios. To address these issues, we provide a closer analysis of the RT framework, which holds for both continuous and discrete-time feedback. We demonstrate that the essence of the RT framework is to transform the problem of learning NE in the original game into a series of strongly convex-concave optimization problems (SCCPs). We show that the bottleneck of RT-based algorithms is the speed of solving SCCPs. To improve the their empirical performance, we design a novel transformation method to enable the SCCPs can be solved by Regret Matching+ (RM+), a no-regret algorithm with better empirical performance, resulting in Reward Transformation RM+ (RTRM+). RTRM+ enjoys last-iterate convergence under the discrete-time feedback setting. Using the counterfactual regret decomposition framework, we propose Reward Transformation CFR+ (RTCFR+) to extend RTRM+ to EFGs. Experimental results show that our algorithms significantly outperform existing last-iterate convergence algorithms and RM+ (CFR+).

Neural Future-Dependent Online Mirror Descent.

Monte Carlo Neural Fictitious Self-Play: Achieve Approximate Nash equilibrium of Imperfect-Information Games.

RM-FSP: Regret Minimization Optimizes Neural Fictitious Self-Play

Efficient Phi-Regret Minimization in Extensive-Form Games via Online Mirror Descent

Monte Carlo Neural Fictitious Self-Play: Approach to Approximate Nash equilibrium of Imperfect-Information Games

A Monte Carlo Neural Fictitious Self-Play approach to approximate Nash Equilibrium in imperfect-information dynamic games

An Efficient Deep Reinforcement Learning Algorithm for Solving Imperfect Information Extensive-Form Games.

Optimize Neural Fictitious Self-Play in Regret Minimization Thinking

Efficient Last-iterate Convergence Algorithms in Solving Games

Adaptively Perturbed Mirror Descent for Learning in Games

ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret

Online Double Oracle

Banker Online Mirror Descent

A Unified Perspective on Deep Equilibrium Finding

Population-aware Online Mirror Descent for Mean-Field Games by Deep Reinforcement Learning

Fast Last-Iterate Convergence of Learning in Games Requires Forgetful Algorithms

Equivalence Analysis between Counterfactual Regret Minimization and Online Mirror Descent

Decentralized Online Learning for Noncooperative Games in Dynamic Environments

Neural Auto-Curricula

Deep Reinforcement Learning from Self-Play in No-limit Texas Hold'em Poker

Decentralized Online Learning for Noncooperative Games in Dynamic Environments