Abstract: Large sequence model (SM) such as GPT series and BERT has displayed outstanding performance and generalization capabilities on vision, language, and recently reinforcement learning tasks. A natural follow-up question is how to abstract multi-agent decision making into an SM problem and benefit from the prosperous development of SMs. In this paper, we introduce a novel architecture named Multi-Agent Transformer (MAT) that effectively casts cooperative multi-agent reinforcement learning (MARL) into SM problems wherein the task is to map agents' observation sequence to agents' optimal action sequence. Our goal is to build the bridge between MARL and SMs so that the modeling power of modern sequence models can be unleashed for MARL. Central to our MAT is an encoder-decoder architecture which leverages the multi-agent advantage decomposition theorem to transform the joint policy search problem into a sequential decision making process; this renders only linear time complexity for multi-agent problems and, most importantly, endows MAT with monotonic performance improvement guarantee. Unlike prior arts such as Decision Transformer fit only pre-collected offline data, MAT is trained by online trials and errors from the environment in an on-policy fashion. To validate MAT, we conduct extensive experiments on StarCraftII, Multi-Agent MuJoCo, Dexterous Hands Manipulation, and Google Research Football benchmarks. Results demonstrate that MAT achieves superior performance and data efficiency compared to strong baselines including MAPPO and HAPPO. Furthermore, we demonstrate that MAT is an excellent few-short learner on unseen tasks regardless of changes in the number of agents. See our project page at https://sites.google.com/view/multi-agent-transformer.

Bi-level Multi-Agent Actor-Critic Methods with Ransformers

MAPPO method based on attention behavior network

Efficient Multi-Agent Exploration with Mutual-Guided Actor-Critic

Meta Actor-Critic Framework for Multi-Agent Reinforcement Learning

Bi-Level Actor-Critic for Multi-Agent Coordination.

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

AMAGO-2: Breaking the Multi-Task Barrier in Meta-Reinforcement Learning with Transformers

Decomposed Soft Actor-Critic Method for Cooperative Multi-Agent Reinforcement Learning

APC: Predict Global Representation from Local Observation in Multi-Agent Reinforcement Learning

UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers

Multi-Agent Reinforcement Learning with Selective State-Space Models

Value-Decomposition Multi-Agent Actor-Critics

Cooperative multi-agent game based on reinforcement learning

Passivity analysis for switched generalized neural networks with time-varying delay and uncertain output

Multi-Agent Reinforcement Learning is a Sequence Modeling Problem

RoMAT: Role-based multi-agent transformer for generalizable heterogeneous cooperation

B2MAPO: A Batch-by-Batch Multi-Agent Policy Optimization to Balance Performance and Efficiency

Off-Policy Multi-Agent Decomposed Policy Gradients

Multi actor hierarchical attention critic with RNN-based feature extraction

Efficient Multi-Task Reinforcement Learning via Task-Specific Action Correction

Multi-Agent Actor-Critic with Hierarchical Graph Attention Network