Mutation-Bias Learning in Games

Johann Bauer,Sheldon West,Eduardo Alonso,Mark Broom
2024-05-28
Abstract:We present two variants of a multi-agent reinforcement learning algorithm based on evolutionary game theoretic considerations. The intentional simplicity of one variant enables us to prove results on its relationship to a system of ordinary differential equations of replicator-mutator dynamics type, allowing us to present proofs on the algorithm's convergence conditions in various settings via its ODE counterpart. The more complicated variant enables comparisons to Q-learning based algorithms. We compare both variants experimentally to WoLF-PHC and frequency-adjusted Q-learning on a range of settings, illustrating cases of increasing dimensionality where our variants preserve convergence in contrast to more complicated algorithms. The availability of analytic results provides a degree of transferability of results as compared to purely empirical case studies, illustrating the general utility of a dynamical systems perspective on multi-agent reinforcement learning when addressing questions of convergence and reliable generalisation.
Machine Learning,Multiagent Systems,Dynamical Systems,Optimization and Control,Populations and Evolution
What problem does this paper attempt to address?
The paper attempts to address the convergence issue in multi-agent reinforcement learning (MARL). Specifically, the authors propose two variants of multi-agent reinforcement learning algorithms based on evolutionary game theory considerations and demonstrate the relationship between these algorithms and the replicator-mutator dynamics system. Through theoretical analysis and experimental comparisons, the paper showcases the performance of these two algorithms under different settings. Particularly, in scenarios with increasing dimensions, the proposed algorithms can maintain convergence compared to other complex algorithms. The main contributions of the paper are: 1. **Theoretical Analysis**: By associating the algorithms with the replicator-mutator dynamics system, the authors are able to prove the convergence of the algorithms under specific conditions. This makes it possible to theoretically evaluate the reliability and generalization ability of the algorithms. 2. **Experimental Validation**: Through experiments on a series of classic games (such as the Prisoner's Dilemma, Matching Pennies, Rock-Paper-Scissors, etc.), the practical effectiveness of the proposed algorithms is demonstrated and compared with other known algorithms, such as WoLF-PHC and Frequency Adjusted Q-learning (FAQ). In summary, the paper aims to explore a multi-agent reinforcement learning algorithm that can maintain stable convergence in high-dimensional scenarios through a combination of theoretical and experimental methods, thereby overcoming the difficulty of ensuring convergence in complex scenarios faced by existing algorithms.