Abstract:Algorithms based on regret matching, specifically regret matching$^+$ (RM$^+$), and its variants are the most popular approaches for solving large-scale two-player zero-sum games in practice. Unlike algorithms such as optimistic gradient descent ascent, which have strong last-iterate and ergodic convergence properties for zero-sum games, virtually nothing is known about the last-iterate properties of regret-matching algorithms. Given the importance of last-iterate convergence for numerical optimization reasons and relevance as modeling real-word learning in games, in this paper, we study the last-iterate convergence properties of various popular variants of RM$^+$. First, we show numerically that several practical variants such as simultaneous RM$^+$, alternating RM$^+$, and simultaneous predictive RM$^+$, all lack last-iterate convergence guarantees even on a simple $3\times 3$ game. We then prove that recent variants of these algorithms based on a smoothing technique do enjoy last-iterate convergence: we prove that extragradient RM$^{+}$ and smooth Predictive RM$^+$ enjoy asymptotic last-iterate convergence (without a rate) and $1/\sqrt{t}$ best-iterate convergence. Finally, we introduce restarted variants of these algorithms, and show that they enjoy linear-rate last-iterate convergence.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is related to the last - iterate convergence of the Regret Matching (RM+ ) algorithm and its variants in games. Specifically, the research objectives include: 1. **Explore the last - iterate behavior of existing RM+ and its variants**: The paper shows through numerical experiments that several popular RM+ variants (such as Simultaneous RM+, Alternating RM+ and Predictive RM+ ) lack last - iterate convergence in simple 3×3 games. 2. **Prove the last - iterate convergence of specific RM+ variants**: The paper proves that recently proposed RM+ variants based on smoothing techniques (such as Extragradient RM+ and Smooth Predictive RM+ ) have last - iterate convergence. Specifically: - ExRM+ and SPRM+ have asymptotic last - iterate convergence (without rate). - The optimal iteration convergence rate is $ O\left(\frac{1}{\sqrt{t}}\right) $. 3. **Introduce a restart mechanism to achieve a linear convergence rate**: The paper introduces restart variants (such as Restart ExRM+ and Restart SPRM+ ) and proves that these variants can achieve a linear last - iterate convergence rate. ### Main problem summary The core problem of this paper is to explore and verify the last - iterate convergence properties of RM+ and its variants when solving zero - sum games. Compared with traditional algorithms such as gradient descent - ascent, RM+ and its variants are very popular in practical applications, but there is less theoretical understanding of their last - iterate convergence. Therefore, this paper aims to fill this theoretical gap and propose improved algorithm variants to ensure better convergence performance. ### Key contributions 1. **Numerical evidence**: Provide numerical evidence that RM+ and its important variants (such as Alternating RM+ and Predictive RM+ ) may not asymptotically converge in the last - iterate. 2. **Theoretical proof**: Prove the asymptotic last - iterate convergence of ExRM+ and SPRM+ and give the convergence rate of the optimal iteration. 3. **Restart mechanism**: Propose ExRM+ and SPRM+ under the restart mechanism and prove their linear last - iterate convergence rate. 4. **Positive results under strict assumptions**: Prove that under the restrictive conditions of strict Nash equilibrium, RM+ does have last - iterate convergence. Through these works, the paper provides an important theoretical basis and practical guidance for understanding and improving regret - matching - based algorithms.

Last-Iterate Convergence Properties of Regret-Matching Algorithms in Games

Efficient Last-iterate Convergence Algorithms in Solving Games

Fast Last-Iterate Convergence of Learning in Games Requires Forgetful Algorithms

On the Last-iterate Convergence in Time-varying Zero-sum Games: Extra Gradient Succeeds where Optimism Fails

Doubly Optimal No-Regret Learning in Monotone Games

Last-Iterate Convergence of Payoff-Based Independent Learning in Zero-Sum Stochastic Games

RM-FSP: Regret Minimization Optimizes Neural Fictitious Self-Play

On the Convergence of No-Regret Learning Dynamics in Time-Varying Games

Learning not to Regret

Uncoupled and Convergent Learning in Monotone Games under Bandit Feedback

Last-iterate Convergence Separation between Extra-gradient and Optimism in Constrained Periodic Games

Geometrical Regret Matching

Is Learning in Games Good for the Learners?

Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback

Player-optimal Stable Regret for Bandit Learning in Matching Markets

Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games with Bandit Feedback

Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback

Robust No-Regret Learning in Min-Max Stackelberg Games

Evolutionary Dynamics and $Φ$-Regret Minimization in Games

Convergence Analysis of No-Regret Bidding Algorithms in Repeated Auctions

Games played by Exponential Weights Algorithms