Abstract:Transformers, originally devised for natural language processing (NLP), have also produced significant successes in computer vision (CV). Due to their strong expression power, researchers are investigating ways to deploy transformers for reinforcement learning (RL), and transformer-based models have manifested their potential in representative RL benchmarks. In this paper, we collect and dissect recent advances concerning the transformation of RL with transformers (transformer-based RL (TRL)) to explore the development trajectory and future trends of this field. We group the existing developments into two categories: architecture enhancements and trajectory optimizations, and examine the main applications of TRL in robotic manipulation, text-based games (TBGs), navigation, and autonomous driving. Architecture enhancement methods consider how to apply the powerful transformer structure to RL problems under the traditional RL framework, facilitating more precise modeling of agents and environments compared to traditional deep RL techniques. However, these methods are still limited by the inherent defects of traditional RL algorithms, such as bootstrapping and the "deadly triad". Trajectory optimization methods treat RL problems as sequence modeling problems and train a joint state-action model over entire trajectories under the behavior cloning framework; such approaches are able to extract policies from static datasets and fully use the long-sequence modeling capabilities of transformers. Given these advancements, the limitations and challenges in TRL are reviewed and proposals regarding future research directions are discussed. We hope that this survey can provide a detailed introduction to TRL and motivate future research in this rapidly developing field.

Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation

Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA

Reformer: The Efficient Transformer

AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning

When Do Transformers Shine in RL? Decoupling Memory from Credit Assignment

Solving time-delay issues in reinforcement learning via transformers

Parallelizing Linear Transformers with the Delta Rule over Sequence Length

Towards Long-delayed Sparsity: Learning a Better Transformer Through Reward Redistribution.

Transformers are Meta-Reinforcement Learners

On Transforming Reinforcement Learning by Transformer: The Development Trajectory

Reinforcement Learning via Auxiliary Task Distillation

Efficient Transformer Knowledge Distillation: A Performance Review

Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining

Efficient World Models with Context-Aware Tokenization

Scavenging Hyena: Distilling Transformers into Long Convolution Models

A Hybrid Online Off-Policy Reinforcement Learning Agent Framework Supported by Transformers

Learning to Grow Pretrained Models for Efficient Transformer Training

On Transforming Reinforcement Learning With Transformers: The Development Trajectory

Multi-agent transformer-accelerated RL for satisfaction of STL specifications

Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

Sample Efficient Deep Reinforcement Learning with Online State Abstraction and Causal Transformer Model Prediction