Abstract:We present two variants of a multi-agent reinforcement learning algorithm based on evolutionary game theoretic considerations. The intentional simplicity of one variant enables us to prove results on its relationship to a system of ordinary differential equations of replicator-mutator dynamics type, allowing us to present proofs on the algorithm's convergence conditions in various settings via its ODE counterpart. The more complicated variant enables comparisons to Q-learning based algorithms. We compare both variants experimentally to WoLF-PHC and frequency-adjusted Q-learning on a range of settings, illustrating cases of increasing dimensionality where our variants preserve convergence in contrast to more complicated algorithms. The availability of analytic results provides a degree of transferability of results as compared to purely empirical case studies, illustrating the general utility of a dynamical systems perspective on multi-agent reinforcement learning when addressing questions of convergence and reliable generalisation.

What problem does this paper attempt to address?

The paper attempts to address the convergence issue in multi-agent reinforcement learning (MARL). Specifically, the authors propose two variants of multi-agent reinforcement learning algorithms based on evolutionary game theory considerations and demonstrate the relationship between these algorithms and the replicator-mutator dynamics system. Through theoretical analysis and experimental comparisons, the paper showcases the performance of these two algorithms under different settings. Particularly, in scenarios with increasing dimensions, the proposed algorithms can maintain convergence compared to other complex algorithms. The main contributions of the paper are: 1. **Theoretical Analysis**: By associating the algorithms with the replicator-mutator dynamics system, the authors are able to prove the convergence of the algorithms under specific conditions. This makes it possible to theoretically evaluate the reliability and generalization ability of the algorithms. 2. **Experimental Validation**: Through experiments on a series of classic games (such as the Prisoner's Dilemma, Matching Pennies, Rock-Paper-Scissors, etc.), the practical effectiveness of the proposed algorithms is demonstrated and compared with other known algorithms, such as WoLF-PHC and Frequency Adjusted Q-learning (FAQ). In summary, the paper aims to explore a multi-agent reinforcement learning algorithm that can maintain stable convergence in high-dimensional scenarios through a combination of theoretical and experimental methods, thereby overcoming the difficulty of ensuring convergence in complex scenarios faced by existing algorithms.

Mutation-Bias Learning in Games

Evolution with Opponent-Learning Awareness

Evolutionary Game Theory Squared: Evolving Agents in Endogenously Evolving Zero-Sum Games

Asymptotic Convergence and Performance of Multi-Agent Q-Learning Dynamics

A Risk-Averse Equilibrium for Multi-Agent Systems

Non-cooperative Multi-agent Systems with Exploring Agents

Evolutionary Multi-agent Reinforcement Learning in Group Social Dilemmas

Chaos persists in large-scale multi-agent learning despite adaptive learning rates

Beyond Strict Competition: Approximate Convergence of Multi Agent Q-Learning Dynamics

Multi-Agent Quantum Reinforcement Learning using Evolutionary Optimization

A Single-Task and Multi-Decision Evolutionary Game Model Based on Multi-Agent Reinforcement Learning

Evolutionary Multiplayer Games

Learning, evolution and population dynamics

Evolutionary Game Dynamics of Multi-Agent Cooperation Driven by Self-Learning

Penalty-Regulated Dynamics and Robust Learning Procedures in Games

On Gradient-Based Learning in Continuous Games

Exploring Dominant Strategies in Iterated and Evolutionary Games: a Multi-Agent Reinforcement Learning Approach

Lineage Evolution Reinforcement Learning

Stability of Multi-Agent Learning: Convergence in Network Games with Many Players

Exploring the performance of volatile mutations on evolutionary game dynamics in complex networks

Nested replicator dynamics, nested logit choice, and similarity-based learning