Abstract:This paper presents a payoff perturbation technique, introducing a strong convexity to players' payoff functions in games. This technique is specifically designed for first-order methods to achieve last-iterate convergence in games where the gradient of the payoff functions is monotone in the strategy profile space, potentially containing additive noise. Although perturbation is known to facilitate the convergence of learning algorithms, the magnitude of perturbation requires careful adjustment to ensure last-iterate convergence. Previous studies have proposed a scheme in which the magnitude is determined by the distance from a periodically re-initialized anchoring or reference strategy. Building upon this, we propose Gradient Ascent with Boosting Payoff Perturbation, which incorporates a novel perturbation into the underlying payoff function, maintaining the periodically re-initializing anchoring strategy scheme. This innovation empowers us to provide faster last-iterate convergence rates against the existing payoff perturbed algorithms, even in the presence of additive noise.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of achieving **last - iterate convergence** in monotone games. Specifically, the author proposes a new perturbed gradient ascent algorithm (Gradient Ascent with Boosting Payoff Perturbation, GABP) to accelerate the speed of reaching Nash equilibrium in the presence of noisy feedback. #### Main problem background 1. **Learning in monotone games**: - In monotone games, the gradient of the payoff function is a monotone function in the strategy configuration space. Such games include many common game types, such as convex - concave games, zero - sum games, etc. - Typical learning algorithms (such as gradient ascent, multiplicative weight update) can only converge to Nash equilibrium in the sense of average iteration, which may be less than ideal in some application scenarios, such as when training generative adversarial networks or fine - tuning large - language models. 2. **The need for last - iterate convergence**: - Last - iterate convergence means that the updated strategy configuration itself directly converges to Nash equilibrium, rather than through averaging the results of multiple iterations. This convergence method is stronger than average - iteration convergence and is more suitable for practical applications. 3. **The impact of noisy feedback**: - In the real world, feedback information often contains noise, which places higher requirements on the performance of learning algorithms. Although some existing optimistic algorithms perform well in the absence of noise, they have a slow convergence speed under noisy feedback. #### Core contributions of the paper 1. **Introduction of enhanced perturbation techniques**: - A new perturbation technique is proposed, which stabilizes the learning process by introducing strong convexity and can maintain fast convergence in the presence of noise. 2. **Improved algorithm design**: - The GABP algorithm is designed. Based on the original Adaptive Perturbed Mirror Descent (APMD), an additional perturbation term is added. This additional term can accelerate the convergence speed and shows better performance under noisy feedback. 3. **Theoretical analysis and experimental verification**: - It is theoretically proven that the convergence rates of GABP under full feedback and noisy feedback are \(\tilde{O}(1/T)\) and \(\tilde{O}(1/T^{1/7})\) respectively, which are better than existing methods. - Experimental results show that GABP performs well in both random - payoff games and difficult convex - concave games, especially under noisy feedback conditions, its performance is significantly better than other algorithms. #### Summary This paper solves the problem of achieving last - iterate convergence in monotone games by introducing a new perturbation technique and significantly improves the convergence speed under noisy feedback conditions. This result provides important theoretical and technical support for the development of more efficient and more robust game - learning algorithms.

Boosting Perturbed Gradient Ascent for Last-Iterate Convergence in Games

Adaptively Perturbed Mirror Descent for Learning in Games

Joint-perturbation simultaneous pseudo-gradient

A Payoff-Based Policy Gradient Method in Stochastic Games with Long-Run Average Payoffs

Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback

Exploiting hidden structures in non-convex games for convergence to Nash equilibrium

On the Last-iterate Convergence in Time-varying Zero-sum Games: Extra Gradient Succeeds where Optimism Fails

Dealing with unbounded gradients in stochastic saddle-point optimization

Efficient Last-iterate Convergence Algorithms in Solving Games

BIG Hype: Best Intervention in Games via Distributed Hypergradient Descent

Beyond Minimax Optimality: A Subgame Perfect Gradient Method

Exponential Convergence of Gradient Methods in Concave Network Zero-sum Games

Iteratively Regularized Gradient Tracking Methods for Optimal Equilibrium Seeking

Gradient play in stochastic games: stationary points, convergence, and sample complexity

Boosting Gradient Ascent for Continuous DR-submodular Maximization

Gradient-Variation Online Learning under Generalized Smoothness

An Objective Improvement Approach to Solving Discounted Payoff Games

Doubly Optimal No-Regret Learning in Monotone Games

Perturbated Gradients Updating within Unit Space for Deep Learning

Learning to Control Unknown Strongly Monotone Games

Distributed Nash equilibrium seeking strategies via bilateral bounded gradient approach