Abstract:Offline reinforcement learning leverages previously collected offline datasets to learn optimal policies with no necessity to access the real environment. Such a paradigm is also desirable for multi-agent reinforcement learning (MARL) tasks, given the combinatorially increased interactions among agents and with the environment. However, in MARL, the paradigm of offline pre-training with online fine-tuning has not been studied, nor even datasets or benchmarks for offline MARL research are available. In this paper, we facilitate the research by providing large-scale datasets and using them to examine the usage of the decision transformer in the context of MARL. We investigate the generalization of MARL offline pre-training in the following three aspects: 1) between single agents and multiple agents, 2) from offline pretraining to online fine tuning, and 3) to that of multiple downstream tasks with few-shot and zero-shot capabilities. We start by introducing the first offline MARL dataset with diverse quality levels based on the StarCraftII environment, and then propose the novel architecture of multi-agent decision transformer (MADT) for effective offline learning. MADT leverages the transformer's modelling ability for sequence modelling and integrates it seamlessly with both offline and online MARL tasks. A significant benefit of MADT is that it learns generalizable policies that can transfer between different types of agents under different task scenarios. On the StarCraft II offline dataset, MADT outperforms the state-of-the-art offline reinforcement learning (RL) baselines, including BCQ and CQL. When applied to online tasks, the pre-trained MADT significantly improves sample efficiency and enjoys strong performance in both few-short and zero-shot cases. To the best of our knowledge, this is the first work that studies and demonstrates the effectiveness of offline pre-trained models in terms of sample efficiency and generalizability enhancements for MARL.

Enhancing Cross-domain Pre-Trained Decision Transformers with Adaptive Attention

Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks

Decision Mamba: Reinforcement Learning via Hybrid Selective Sequence Modeling

Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation

Offline Pre-trained Multi-agent Decision Transformer

Enhanced Pre-Trained Transformer with Aligned Attention Map for Text Matching

Predictive Attention Transformer: Improving Transformer with Attention Map Prediction

Future-conditioned Unsupervised Pretraining for Decision Transformer

Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning

Domain Perceptive-Pruning and Fine-Tuning the Pre-Trained Model for Heterogeneous Transfer Learning in Cross Domain Prediction

Paying More Attention to Self-attention: Improving Pre-trained Language Models via Attention Guiding

Solving time-delay issues in reinforcement learning via transformers

Predictive Coding for Decision Transformer

Cross-Domain Transformer with Adaptive Thresholding for Domain Adaptive Semantic Segmentation

Extreme Multi-Domain, Multi-Task Learning With Unified Text-to-Text Transfer Transformers

Investigating Pre-trained Language Models on Cross-Domain Datasets, a Step Closer to General AI

A Joint Domain-Specific Pre-Training Method Based on Data Enhancement

Cross-domain Detection Transformer based on Spatial-aware and Semantic-aware Token Alignment

HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning

Understanding the Training and Generalization of Pretrained Transformer for Sequential Decision Making

Domain-oriented Language Pre-training with Adaptive Hybrid Masking and Optimal Transport Alignment