Abstract:Existing value-factorized based Multi-Agent deep Reinforce-ment Learning (MARL) approaches are well-performing invarious multi-agent cooperative environment under thecen-tralized training and decentralized execution(CTDE) scheme,where all agents are trained together by the centralized valuenetwork and each agent execute its policy independently. How-ever, an issue remains open: in the centralized training process,when the environment for the team is partially observable ornon-stationary, i.e., the observation and action informationof all the agents cannot represent the global states, existingmethods perform poorly and sample inefficiently. Regret Min-imization (RM) can be a promising approach as it performswell in partially observable and fully competitive <a class="link-external link-http" href="http://settings.However" rel="external noopener nofollow">this http URL</a>, it tends to model others as opponents and thus can-not work well under the CTDE scheme. In this work, wepropose a novel team RM based Bayesian MARL with threekey contributions: (a) we design a novel RM method to traincooperative agents as a team and obtain a team regret-basedpolicy for that team; (b) we introduce a novel method to de-compose the team regret to generate the policy for each agentfor decentralized execution; (c) to further improve the perfor-mance, we leverage a differential particle filter (a SequentialMonte Carlo method) network to get an accurate estimation ofthe state for each agent. Experimental results on two-step ma-trix games (cooperative game) and battle games (large-scalemixed cooperative-competitive games) demonstrate that ouralgorithm significantly outperforms state-of-the-art methods.

Intrinsic Action Tendency Consistency for Cooperative Multi-Agent Reinforcement Learning

DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement Learning

Priority over Quantity: A Self-Incentive Credit Assignment Scheme for Cooperative Multiagent Reinforcement Learning

Coordinated Exploration via Intrinsic Rewards for Multi-Agent Reinforcement Learning

Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?

Reinforcement learning for encouraging cooperation in a multiagent system

Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning

Tacit Learning with Adaptive Information Selection for Cooperative Multi-Agent Reinforcement Learning

ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward

Consciousness-Aware Multi-Agent Reinforcement Learning

Learning Multi-Agent Cooperation via Considering Actions of Teammates

Complementary Attention for Multi-Agent Reinforcement Learning.

Situation-Dependent Causal Influence-Based Cooperative Multi-agent Reinforcement Learning

Coordination as inference in multi-agent reinforcement learning

LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning.

LJIR: Learning Joint-Action Intrinsic Reward in cooperative multi-agent reinforcement learning

STAS: Spatial-Temporal Return Decomposition for Multi-agent Reinforcement Learning.

Hierarchical Consensus-Based Multi-Agent Reinforcement Learning for Multi-Robot Cooperation Tasks

Optimistic sequential multi-agent reinforcement learning with motivational communication

Inducing Cooperation via Team Regret Minimization based Multi-Agent Deep Reinforcement Learning

Intrinsic Reward with Peer Incentives for Cooperative Multi-Agent Reinforcement Learning.