Abstract:Multiagent reinforcement learning (MARL) has recently attracted considerable attention from both academics and practitioners. Core issues, e.g., the curse of dimensionality due to the exponential growth of agent interactions and nonstationary environments due to simultaneous learning, hinder the large-scale proliferation of MARL. These problems deteriorate with an increased number of agents. To address these challenges, we propose an adversarial collaborative learning method in a mixed cooperative–competitive environment, exploiting friend-or-foe Q-learning and mean-field theory. We first treat neighbors of agent <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="0.802ex" height="2.176ex" style="vertical-align: -0.338ex;" viewBox="0 -791.3 345.5 936.9" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-69" x="0" y="0"></use></g></svg></span> as two coalitions ( <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="0.802ex" height="2.176ex" style="vertical-align: -0.338ex;" viewBox="0 -791.3 345.5 936.9" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-69" x="0" y="0"></use></g></svg></span> 's friend and opponent coalition, respectively), and convert the Markov game into a two-player zero-sum game with an extended action set. By exploiting mean-field theory, this new game simplifies the interactions as those between a single agent and the mean effects of friends and opponents. A neural network is employed to learn the optimal mean effects of these two coalitions, which are trained via adversarial max and min steps. In the max step, with fixed policies of opponents, we optimize the friends' mean action to maximize their rewards. In the min step, the mean action of opponents is trained to minimize the friends' rewards when the policies of friends are frozen. These two steps are proved to converge to a Nash equilibrium. Then, another neural network is applied to learn the best response of each agent toward the mean effects. Finally, the adversarial max and min steps can jointly optimize the two networks. Experiments on two platforms demonstrate the learning effectiveness and strength of our approach, e-pecially with many agents.<svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><defs id="MathJax_SVG_glyphs"><path stroke-width="1" id="MJMATHI-69" d="M184 600Q184 624 203 642T247 661Q265 661 277 649T290 619Q290 596 270 577T226 557Q211 557 198 567T184 600ZM21 287Q21 295 30 318T54 369T98 420T158 442Q197 442 223 419T250 357Q250 340 236 301T196 196T154 83Q149 61 149 51Q149 26 166 26Q175 26 185 29T208 43T235 78T260 137Q263 149 265 151T282 153Q302 153 302 143Q302 135 293 112T268 61T223 11T161 -11Q129 -11 102 10T74 74Q74 91 79 106T122 220Q160 321 166 341T173 380Q173 404 156 404H154Q124 404 99 371T61 287Q60 286 59 284T58 281T56 279T53 278T49 278T41 278H27Q21 284 21 287Z"></path></defs></svg>

Neural Auto-Curricula in Two-Player Zero-Sum Games.

Neural Auto-Curricula

Accelerate Multi-Agent Reinforcement Learning in Zero-Sum Games with Subgame Curriculum Learning

Monte Carlo Neural Fictitious Self-Play: Achieve Approximate Nash equilibrium of Imperfect-Information Games.

Reinforcement Learning In Two Player Zero Sum Simultaneous Action Games

Neural Population Learning beyond Symmetric Zero-sum Games

A Generalized Training Approach for Multiagent Learning

Learning in Nonzero-Sum Stochastic Games with Potentials

MARL-LNS: Cooperative Multi-agent Reinforcement Learning via Large Neighborhoods Search

A Monte Carlo Neural Fictitious Self-Play approach to approximate Nash Equilibrium in imperfect-information dynamic games

Monte Carlo Neural Fictitious Self-Play: Approach to Approximate Nash equilibrium of Imperfect-Information Games

Towards convergence to Nash equilibria in two-team zero-sum games

Automatic Curriculum Generation for Reinforcement Learning in Zero-Sum Games

A Meta-Game Evaluation Framework for Deep Multiagent Reinforcement Learning

Fictitious Cross-Play: Learning Global Nash Equilibrium in Mixed Cooperative-Competitive Games

From mimic to counteract: a two-stage reinforcement learning algorithm for Google research football

Modeling opponent learning in multiagent repeated games

Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning

Towards Skilled Population Curriculum for Multi-Agent Reinforcement Learning

ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency

Multiagent Adversarial Collaborative Learning via Mean-Field Theory