Abstract:Multiagent reinforcement learning (MARL) has recently attracted considerable attention from both academics and practitioners. Core issues, e.g., the curse of dimensionality due to the exponential growth of agent interactions and nonstationary environments due to simultaneous learning, hinder the large-scale proliferation of MARL. These problems deteriorate with an increased number of agents. To address these challenges, we propose an adversarial collaborative learning method in a mixed cooperative–competitive environment, exploiting friend-or-foe Q-learning and mean-field theory. We first treat neighbors of agent <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="0.802ex" height="2.176ex" style="vertical-align: -0.338ex;" viewBox="0 -791.3 345.5 936.9" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-69" x="0" y="0"></use></g></svg></span> as two coalitions ( <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="0.802ex" height="2.176ex" style="vertical-align: -0.338ex;" viewBox="0 -791.3 345.5 936.9" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-69" x="0" y="0"></use></g></svg></span> 's friend and opponent coalition, respectively), and convert the Markov game into a two-player zero-sum game with an extended action set. By exploiting mean-field theory, this new game simplifies the interactions as those between a single agent and the mean effects of friends and opponents. A neural network is employed to learn the optimal mean effects of these two coalitions, which are trained via adversarial max and min steps. In the max step, with fixed policies of opponents, we optimize the friends' mean action to maximize their rewards. In the min step, the mean action of opponents is trained to minimize the friends' rewards when the policies of friends are frozen. These two steps are proved to converge to a Nash equilibrium. Then, another neural network is applied to learn the best response of each agent toward the mean effects. Finally, the adversarial max and min steps can jointly optimize the two networks. Experiments on two platforms demonstrate the learning effectiveness and strength of our approach, e-pecially with many agents.<svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><defs id="MathJax_SVG_glyphs"><path stroke-width="1" id="MJMATHI-69" d="M184 600Q184 624 203 642T247 661Q265 661 277 649T290 619Q290 596 270 577T226 557Q211 557 198 567T184 600ZM21 287Q21 295 30 318T54 369T98 420T158 442Q197 442 223 419T250 357Q250 340 236 301T196 196T154 83Q149 61 149 51Q149 26 166 26Q175 26 185 29T208 43T235 78T260 137Q263 149 265 151T282 153Q302 153 302 143Q302 135 293 112T268 61T223 11T161 -11Q129 -11 102 10T74 74Q74 91 79 106T122 220Q160 321 166 341T173 380Q173 404 156 404H154Q124 404 99 371T61 287Q60 286 59 284T58 281T56 279T53 278T49 278T41 278H27Q21 284 21 287Z"></path></defs></svg>

A Weighted Mean Field Reinforcement Learning Algorithm for Large-Scale Multi-Agent Collaboration

GAT-MF: Graph Attention Mean Field for Very Large Scale Multi-Agent Reinforcement Learning

Very Large Scale Multi-Agent Reinforcement Learning with Graph Attention Mean Field

Partially Observable Mean Field Multi-Agent Reinforcement Learning Based on Graph-Attention

Weighted Mean Field Reinforcement Learning for Large-Scale UAV Swarm Confrontation

Mean-Field Multi-Agent Reinforcement Learning: A Decentralized Network Approach

Depthwise Convolution for Multi-Agent Communication With Enhanced Mean-Field Approximation

Evolutionary reinforcement learning algorithm for large-scale multi-agent cooperation and confrontation applications

Multi Type Mean Field Reinforcement Learning

Attention Enhanced Reinforcement Learning for Multi agent Cooperation

Multiagent Adversarial Collaborative Learning via Mean-Field Theory

Mean-Field Multiagent Reinforcement Learning: A Decentralized Network Approach

Weighted Double Deep Multiagent Reinforcement Learning in Stochastic Cooperative Environments

Cooperative Learning of Multi-Agent Systems Via Reinforcement Learning

Reinforcement learning for multi-agent formation navigation with scalability

Mean Field Multi-Agent Reinforcement Learning Method for Area Traffic Signal Control

Major-Minor Mean Field Multi-Agent Reinforcement Learning

Efficient Distributed Framework for Collaborative Multi-Agent Reinforcement Learning

Decentralized Optimal Tracking Control for Large-scale Multi-Agent Systems under Complex Environment: A Constrained Mean Field Game with Reinforcement Learning Approach

Scalable and Transferable Reinforcement Learning for Multi-Agent Mixed Cooperative–Competitive Environments Based on Hierarchical Graph Attention

Individual Reward Assisted Multi-Agent Reinforcement Learning.