Abstract:Multiagent reinforcement learning (MARL) has recently attracted considerable attention from both academics and practitioners. Core issues, e.g., the curse of dimensionality due to the exponential growth of agent interactions and nonstationary environments due to simultaneous learning, hinder the large-scale proliferation of MARL. These problems deteriorate with an increased number of agents. To address these challenges, we propose an adversarial collaborative learning method in a mixed cooperative–competitive environment, exploiting friend-or-foe Q-learning and mean-field theory. We first treat neighbors of agent <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="0.802ex" height="2.176ex" style="vertical-align: -0.338ex;" viewBox="0 -791.3 345.5 936.9" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-69" x="0" y="0"></use></g></svg></span> as two coalitions ( <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="0.802ex" height="2.176ex" style="vertical-align: -0.338ex;" viewBox="0 -791.3 345.5 936.9" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-69" x="0" y="0"></use></g></svg></span> 's friend and opponent coalition, respectively), and convert the Markov game into a two-player zero-sum game with an extended action set. By exploiting mean-field theory, this new game simplifies the interactions as those between a single agent and the mean effects of friends and opponents. A neural network is employed to learn the optimal mean effects of these two coalitions, which are trained via adversarial max and min steps. In the max step, with fixed policies of opponents, we optimize the friends' mean action to maximize their rewards. In the min step, the mean action of opponents is trained to minimize the friends' rewards when the policies of friends are frozen. These two steps are proved to converge to a Nash equilibrium. Then, another neural network is applied to learn the best response of each agent toward the mean effects. Finally, the adversarial max and min steps can jointly optimize the two networks. Experiments on two platforms demonstrate the learning effectiveness and strength of our approach, e-pecially with many agents.<svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><defs id="MathJax_SVG_glyphs"><path stroke-width="1" id="MJMATHI-69" d="M184 600Q184 624 203 642T247 661Q265 661 277 649T290 619Q290 596 270 577T226 557Q211 557 198 567T184 600ZM21 287Q21 295 30 318T54 369T98 420T158 442Q197 442 223 419T250 357Q250 340 236 301T196 196T154 83Q149 61 149 51Q149 26 166 26Q175 26 185 29T208 43T235 78T260 137Q263 149 265 151T282 153Q302 153 302 143Q302 135 293 112T268 61T223 11T161 -11Q129 -11 102 10T74 74Q74 91 79 106T122 220Q160 321 166 341T173 380Q173 404 156 404H154Q124 404 99 371T61 287Q60 286 59 284T58 281T56 279T53 278T49 278T41 278H27Q21 284 21 287Z"></path></defs></svg>

Multi-Agent Mean Field Predict Reinforcement Learning

A Dynamically Adaptive Approach to Reducing Strategic Interference for Multi-agent Systems

A Weighted Mean Field Reinforcement Learning Algorithm for Large-Scale Multi-Agent Collaboration

Adaptive Mean Field Multi-Agent Reinforcement Learning

Partially Observable Mean Field Multi-Agent Reinforcement Learning Based on Graph-Attention

Dueling Network Architecture for Multi-Agent Deep Deterministic Policy Gradient

Weighted Mean-Field Multi-Agent Reinforcement Learning via Reward Attribution Decomposition

Multi Type Mean Field Reinforcement Learning

Mean-Field Multi-Agent Reinforcement Learning: A Decentralized Network Approach

Depthwise Convolution for Multi-Agent Communication With Enhanced Mean-Field Approximation

Attention Based Large Scale Multi-agent Reinforcement Learning

Multi-Agent Reinforcement Learning Algorithm Based On Neural Networks

Mean-Field Multiagent Reinforcement Learning: A Decentralized Network Approach

GAT-MF: Graph Attention Mean Field for Very Large Scale Multi-Agent Reinforcement Learning

LMRL: a Multi-Agent Reinforcement Learning Model and Algorithm

Reinforcement learning for multi-agent formation navigation with scalability

Major-Minor Mean Field Multi-Agent Reinforcement Learning

Multi-agent Reinforcement Learning Algorithm Based on Local Information

Multi-agent collaboration based on RGMAAC algorithm under partial observability

Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient

Multiagent Adversarial Collaborative Learning via Mean-Field Theory