Exploiting Approximate Symmetry for Efficient Multi-Agent Reinforcement Learning

Batuhan Yardim,Niao He
2024-08-28
Abstract:Mean-field games (MFG) have become significant tools for solving large-scale multi-agent reinforcement learning problems under symmetry. However, the assumption of exact symmetry limits the applicability of MFGs, as real-world scenarios often feature inherent heterogeneity. Furthermore, most works on MFG assume access to a known MFG model, which might not be readily available for real-world finite-agent games. In this work, we broaden the applicability of MFGs by providing a methodology to extend any finite-player, possibly asymmetric, game to an "induced MFG". First, we prove that $N$-player dynamic games can be symmetrized and smoothly extended to the infinite-player continuum via explicit Kirszbraun extensions. Next, we propose the notion of $\alpha,\beta$-symmetric games, a new class of dynamic population games that incorporate approximate permutation invariance. For $\alpha,\beta$-symmetric games, we establish explicit approximation bounds, demonstrating that a Nash policy of the induced MFG is an approximate Nash of the $N$-player dynamic game. We show that TD learning converges up to a small bias using trajectories of the $N$-player game with finite-sample guarantees, permitting symmetrized learning without building an explicit MFG model. Finally, for certain games satisfying monotonicity, we prove a sample complexity of $\widetilde{\mathcal{O}}(\varepsilon^{-6})$ for the $N$-agent game to learn an $\varepsilon$-Nash up to symmetrization bias. Our theory is supported by evaluations on MARL benchmarks with thousands of agents.
Computer Science and Game Theory,Machine Learning,Optimization and Control
What problem does this paper attempt to address?
This paper attempts to solve two main problems in multi - agent reinforcement learning (MARL): 1. **Approximate symmetry problem**: Traditional mean - field games (MFG) assume complete symmetry among all agents, that is, all agents have the same reward function and dynamic characteristics. However, in the real world, the symmetry among agents is often approximate, and there is a certain degree of heterogeneity and asymmetry. This strict symmetry assumption limits the wide applicability of MFG in practical applications. 2. **Unknown model problem**: Many MFG research works assume that the known MFG model can be accessed, but in practical applications, this exact MFG model may not be directly available. Therefore, how to conduct effective learning without an explicit MFG model is an important research problem. To solve these problems, the paper proposes the following methods: - **Extending finite - player games to induce MFG**: Through explicit Kirszbraun extension, the asymmetric finite - player games are extended to the continuous space of infinite - player games, thereby constructing an "induced MFG". This enables the use of MFG theory for approximation even without an explicit MFG model. - **Defining α, β - symmetric games**: A new class of dynamic population games, called α, β - symmetric games, is introduced. These games allow for approximate permutation invariance. For α, β - symmetric games, the paper establishes clear approximation boundaries and proves that the Nash strategy of the induced MFG is an approximate Nash strategy of the N - player dynamic game. - **Convergence analysis of TD learning**: It is proved that TD learning can converge under finite - sample guarantees and can perform symmetrized learning without constructing an explicit MFG model. - **Policy mirror descent (PMD) under monotonicity conditions**: Combined with TD learning, a policy mirror descent algorithm (PMD) is proposed, and its convergence is proved under monotonicity conditions, providing end - to - end learning guarantees. Specifically, the main contributions of the paper include: 1. Constructing a good MFG approximation framework, which is applicable to any (possibly asymmetric) finite - player dynamic games. 2. Defining a new class of α, β - symmetric dynamic games, which can efficiently find approximate Nash equilibria in these games. 3. Proving that the solution of the induced MFG is indeed an approximate Nash equilibrium of the original α, β - symmetric dynamic game, showing the robustness of MFG approximation to heterogeneity and finite - player errors. 4. Analyzing the performance of TD learning on the trajectories of finite - player dynamic games, and proving that strategies on abstract MFG can be evaluated using only a finite number of samples. 5. Under monotonicity conditions, proving that PMD combined with TD learning can converge to an approximate Nash equilibrium, providing effective learning guarantees for large - scale multi - agent games. These contributions provide a theoretical basis and practical methods for solving the approximate symmetry and unknown model problems in large - scale multi - agent reinforcement learning.