Abstract:Multiagent reinforcement learning (MARL) has recently attracted considerable attention from both academics and practitioners. Core issues, e.g., the curse of dimensionality due to the exponential growth of agent interactions and nonstationary environments due to simultaneous learning, hinder the large-scale proliferation of MARL. These problems deteriorate with an increased number of agents. To address these challenges, we propose an adversarial collaborative learning method in a mixed cooperative–competitive environment, exploiting friend-or-foe Q-learning and mean-field theory. We first treat neighbors of agent <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="0.802ex" height="2.176ex" style="vertical-align: -0.338ex;" viewBox="0 -791.3 345.5 936.9" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-69" x="0" y="0"></use></g></svg></span> as two coalitions ( <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="0.802ex" height="2.176ex" style="vertical-align: -0.338ex;" viewBox="0 -791.3 345.5 936.9" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-69" x="0" y="0"></use></g></svg></span> 's friend and opponent coalition, respectively), and convert the Markov game into a two-player zero-sum game with an extended action set. By exploiting mean-field theory, this new game simplifies the interactions as those between a single agent and the mean effects of friends and opponents. A neural network is employed to learn the optimal mean effects of these two coalitions, which are trained via adversarial max and min steps. In the max step, with fixed policies of opponents, we optimize the friends' mean action to maximize their rewards. In the min step, the mean action of opponents is trained to minimize the friends' rewards when the policies of friends are frozen. These two steps are proved to converge to a Nash equilibrium. Then, another neural network is applied to learn the best response of each agent toward the mean effects. Finally, the adversarial max and min steps can jointly optimize the two networks. Experiments on two platforms demonstrate the learning effectiveness and strength of our approach, e-pecially with many agents.<svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><defs id="MathJax_SVG_glyphs"><path stroke-width="1" id="MJMATHI-69" d="M184 600Q184 624 203 642T247 661Q265 661 277 649T290 619Q290 596 270 577T226 557Q211 557 198 567T184 600ZM21 287Q21 295 30 318T54 369T98 420T158 442Q197 442 223 419T250 357Q250 340 236 301T196 196T154 83Q149 61 149 51Q149 26 166 26Q175 26 185 29T208 43T235 78T260 137Q263 149 265 151T282 153Q302 153 302 143Q302 135 293 112T268 61T223 11T161 -11Q129 -11 102 10T74 74Q74 91 79 106T122 220Q160 321 166 341T173 380Q173 404 156 404H154Q124 404 99 371T61 287Q60 286 59 284T58 281T56 279T53 278T49 278T41 278H27Q21 284 21 287Z"></path></defs></svg>

Weighted Mean-Field Multi-Agent Reinforcement Learning via Reward Attribution Decomposition

Adaptive Mean Field Multi-Agent Reinforcement Learning

Multiagent Reinforcement Learning for Strictly Constrained Tasks Based on Reward Recorder

Observer-Based Multiagent Deep Reinforcement Learning: A Fully Distributed Training Scheme

Attention Based Large Scale Multi-agent Reinforcement Learning

A Weighted Mean Field Reinforcement Learning Algorithm for Large-Scale Multi-Agent Collaboration

GAT-MF: Graph Attention Mean Field for Very Large Scale Multi-Agent Reinforcement Learning

Mean-Field Multiagent Reinforcement Learning: A Decentralized Network Approach

Mean-Field Multi-Agent Reinforcement Learning: A Decentralized Network Approach

Partially Observable Mean Field Multi-Agent Reinforcement Learning Based on Graph-Attention

Very Large Scale Multi-Agent Reinforcement Learning with Graph Attention Mean Field

Qatten: A General Framework for Cooperative Multiagent Reinforcement Learning

AVD-Net: Attention Value Decomposition Network for Deep Multi-Agent Reinforcement Learning

Multiagent Adversarial Collaborative Learning via Mean-Field Theory

Decentralized Multi-agent Reinforcement Learning with Multi-time Scale of Decision Epochs

Major-Minor Mean Field Multi-Agent Reinforcement Learning

Hierarchical Deep Multiagent Reinforcement Learning with Temporal Abstraction

Multi-agent reinforcement learning with synchronized and decomposed reward automaton synthesized from reactive temporal logic

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Depthwise Convolution for Multi-Agent Communication With Enhanced Mean-Field Approximation

Multi-agent Dueling Q-learning with Mean Field and Value Decomposition