Abstract:Offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets, which is an important step toward the deployment of multi-agent systems in real-world applications. However, in practice, each individual behavior policy that generates multi-agent joint trajectories usually has a different level of how well it performs. e.g., an agent is a random policy while other agents are medium policies. In the cooperative game with global reward, one agent learned by existing offline MARL often inherits this random policy, jeopardizing the performance of the entire team. In this paper, we investigate offline MARL with explicit consideration on the diversity of agent-wise trajectories and propose a novel framework called Shared Individual Trajectories (SIT) to address this problem. Specifically, an attention-based reward decomposition network assigns the credit to each agent through a differentiable key-value memory mechanism in an offline manner. These decomposed credits are then used to reconstruct the joint offline datasets into prioritized experience replay with individual trajectories, thereafter agents can share their good trajectories and conservatively train their policies with a graph attention network (GAT) based critic. We evaluate our method in both discrete control (i.e., StarCraft II and multi-agent particle environment) and continuous control (i.e, multi-agent mujoco). The results indicate that our method achieves significantly better results in complex and mixed offline multi-agent datasets, especially when the difference of data quality between individual trajectories is large.

Offline Communication Learning with Multi-source Datasets

Observer-Based Multiagent Deep Reinforcement Learning: A Fully Distributed Training Scheme

Efficient Communication via Self-supervised Information Aggregation for Online and Offline Multi-agent Reinforcement Learning

Efficient Communication via Self-Supervised Information Aggregation for Online and Offline Multiagent Reinforcement Learning

Discovering Generalizable Multi-agent Coordination Skills from Multi-task Offline Data

Coordination Failure in Cooperative Offline MARL

Fully Independent Communication in Multi-Agent Reinforcement Learning

Learning Structured Communication for Multi-agent Reinforcement Learning

Offline Multi-Agent Reinforcement Learning via In-Sample Sequential Policy Optimization

A Model-Based Solution to the Offline Multi-Agent Reinforcement Learning Coordination Problem

Efficient Communications in Multi-Agent Reinforcement Learning for Mobile Applications

Putting Data at the Centre of Offline Multi-Agent Reinforcement Learning

Settling the Communication Complexity for Distributed Offline Reinforcement Learning

ComaDICE: Offline Cooperative Multi-Agent Reinforcement Learning with Stationary Distribution Shift Regularization

AC2C: Adaptively Controlled Two-Hop Communication for Multi-Agent Reinforcement Learning

Generalising Multi-Agent Cooperation through Task-Agnostic Communication

Learning from Good Trajectories in Offline Multi-Agent Reinforcement Learning

Offline-to-Online Multi-Agent Reinforcement Learning with Offline Value Function Memory and Sequential Exploration

Scaling Up Multiagent Reinforcement Learning for Robotic Systems: Learn an Adaptive Sparse Communication Graph

Offline Multi-Agent Reinforcement Learning with Implicit Global-to-Local Value Regularization

Context-aware Communication for Multi-agent Reinforcement Learning