Abstract:Existing value-factorized based Multi-Agent deep Reinforce-ment Learning (MARL) approaches are well-performing invarious multi-agent cooperative environment under thecen-tralized training and decentralized execution(CTDE) scheme,where all agents are trained together by the centralized valuenetwork and each agent execute its policy independently. How-ever, an issue remains open: in the centralized training process,when the environment for the team is partially observable ornon-stationary, i.e., the observation and action informationof all the agents cannot represent the global states, existingmethods perform poorly and sample inefficiently. Regret Min-imization (RM) can be a promising approach as it performswell in partially observable and fully competitive <a class="link-external link-http" href="http://settings.However" rel="external noopener nofollow">this http URL</a>, it tends to model others as opponents and thus can-not work well under the CTDE scheme. In this work, wepropose a novel team RM based Bayesian MARL with threekey contributions: (a) we design a novel RM method to traincooperative agents as a team and obtain a team regret-basedpolicy for that team; (b) we introduce a novel method to de-compose the team regret to generate the policy for each agentfor decentralized execution; (c) to further improve the perfor-mance, we leverage a differential particle filter (a SequentialMonte Carlo method) network to get an accurate estimation ofthe state for each agent. Experimental results on two-step ma-trix games (cooperative game) and battle games (large-scalemixed cooperative-competitive games) demonstrate that ouralgorithm significantly outperforms state-of-the-art methods.

Trustable Policy Collaboration Scheme for Multi-Agent Stigmergic Reinforcement Learning.

Stigmergic Independent Reinforcement Learning for Multi-Agent Collaboration

Individual Reward Assisted Multi-Agent Reinforcement Learning.

Communication-Efficient Soft Actor-Critic Policy Collaboration via Regulated Segment Mixture

Off-Agent Trust Region Policy Optimization

Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning

Mutual-Information Regularized Multi-Agent Policy Iteration.

Autonomous Intersection Management with Heterogeneous Vehicles: A Multi-Agent Reinforcement Learning Approach

Trust-based Consensus in Multi-Agent Reinforcement Learning Systems

A Game-Theoretic Approach to Multi-agent Trust Region Optimization.

Mix Q-learning for Lane Changing: A Collaborative Decision-Making Method in Multi-Agent Deep Reinforcement Learning

Robust Multi-Agent Reinforcement Learning by Mutual Information Regularization

Efficient Distributed Framework for Collaborative Multi-Agent Reinforcement Learning

Stabilizing Multi-Agent Deep Reinforcement Learning by Implicitly Estimating Other Agents’ Behaviors

Attention Enhanced Reinforcement Learning for Multi agent Cooperation

Mixed Reinforcement Learning with Additive Stochastic Uncertainty

Inducing Cooperation via Team Regret Minimization based Multi-Agent Deep Reinforcement Learning

Heterogeneous-Agent Reinforcement Learning

Prioritized League Reinforcement Learning for Large-Scale Heterogeneous Multiagent Systems

Multi-Agent Reinforcement Learning-Based Decision Making for Twin-Vehicles Cooperative Driving in Stochastic Dynamic Highway Environments

An off-policy multi-agent stochastic policy gradient algorithm for cooperative continuous control