Abstract:In this paper, we propose a passivity-based methodology for analysis and design of reinforcement learning in multi-agent finite games. Starting from a known exponentially-discounted reinforcement learning scheme, we show that convergence to a Nash distribution can be shown in the class of games characterized by the monotonicity property of their (negative) payoff. We further exploit passivity to propose a class of higher-order schemes that preserve convergence properties, can improve the speed of convergence and can even converge in cases whereby their first-order counterpart fail to converge. We demonstrate these properties through numerical simulations for several representative games.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to analyze and design reinforcement learning algorithms through the passivity method in multi - agent finite - games to achieve the convergence of Nash distribution. Specifically, the paper focuses on how to use passivity techniques to prove that reinforcement learning algorithms can converge to Nash distribution in games with monotonicity properties, and proposes a class of higher - order learning schemes. These schemes not only retain the convergence property but also can improve the convergence speed. Even in some cases, when first - order algorithms cannot converge, these higher - order algorithms can still converge. ### Main contributions of the paper: 1. **Application of passivity framework**: The paper shows how to use the passivity framework to prove the convergence of reinforcement learning in finite - games. 2. **Design of higher - order learning dynamics**: The paper proposes a passivity - based method to design higher - order learning dynamics, which can retain the property of converging to Nash distribution. ### Specific problem description: - **Limitations of existing methods**: Existing reinforcement learning methods mainly focus on the convergence of potential games, while paying less attention to stable games. Stable games include zero - sum games, potential games with concave payoffs, etc. - **Advantages of the new method**: The method proposed in the paper is applicable not only to potential games but also to a wider range of stable games, such as Rock - Paper - Scissors Game and Shapley game. ### Technical means: - **Continuous - time exponential - discount learning (EXP - D - RL)**: The paper starts from a known exponential - discount reinforcement learning scheme, models it as a continuous - time system, and uses the Logit Rule to convert scores into mixed strategies. - **Passivity theory**: Using the concept of equilibrium - independent passivity (EIP) in passivity theory, the convergence of learning dynamics is proved. - **Construction of higher - order dynamics**: By introducing auxiliary states, higher - order learning dynamics are designed. These dynamics can maintain equilibrium points through feedback modification, thus ensuring convergence. ### Conclusions: - **Convergence results**: The paper proves that in games with monotonicity properties, continuous - time exponential - discount learning (EXP - D - RL) can converge to Nash distribution. - **Superiority of higher - order dynamics**: Higher - order dynamics can not only improve the convergence speed but also converge to a larger class of games in some cases, which cannot be achieved by traditional first - order dynamics. ### Examples of mathematical formulas: - **Monotonicity condition**: \[ -(x - x')^\top (U(x) - U(x')) \geq 0, \quad \forall x, x' \in \Delta \] - **Storage function of higher - order dynamics**: \[ V_z(z) = \sum_{p \in N} \left( lsep(z_p) - lsep(z_p) - \nabla lsep(z_p)^\top (z_p - z_p) \right) \] where \( lsep(z_p) = \epsilon \ln \left( \sum_{j \in A_p} \exp \left( \frac{z_{pj}}{\epsilon} \right) \right) \) is the log - sum - exponential function. Through these technical means, the paper successfully expands the application range of reinforcement learning in multi - agent games and provides new theoretical tools and methods.

On Passivity, Reinforcement Learning and Higher-Order Learning in Multi-Agent Finite Games

On Passivity and Reinforcement Learning in Finite Games.

Independent and Decentralized Learning in Markov Potential Games

Poincaré-Bendixson Limit Sets in Multi-Agent Learning

On convergence rates of game theoretic reinforcement learning algorithms

Penalty-Regulated Dynamics and Robust Learning Procedures in Games

Passivity-based Gradient-Play Dynamics for Distributed Generalized Nash Equilibrium Seeking

Convex Markov Games: A Framework for Fairness, Imitation, and Creativity in Multi-Agent Learning

Independent Learning in Stochastic Games

Counterclockwise Dissipativity, Potential Games and Evolutionary Nash Equilibrium Learning

Passivity Tools for Hybrid Learning Rules in Large Populations

A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning

Learning to Play General-Sum Games against Multiple Boundedly Rational Agents

Asymptotic Convergence and Performance of Multi-Agent Q-Learning Dynamics

Higher-Order Uncoupled Dynamics Do Not Lead to Nash Equilibrium -- Except When They Do

A unified stochastic approximation framework for learning in games

An Augmented Game Approach for Design and Analysis of Distributed Learning Dynamics in Multiagent Games.

Stability of the Nash Equilibrium under Gradient Ascent Learning Algorithms in Two-Agent Two-Action Games

Convergent Learning Algorithms for Unknown Reward Games

On Convergence Rates of Robust Adaptive Game Theoretic Learning Algorithms

Learning with Delayed Payoffs in Population Games using Kullback-Leibler Divergence Regularization