Abstract:Transformers, originally devised for natural language processing (NLP), have also produced significant successes in computer vision (CV). Due to their strong expression power, researchers are investigating ways to deploy transformers for reinforcement learning (RL), and transformer-based models have manifested their potential in representative RL benchmarks. In this paper, we collect and dissect recent advances concerning the transformation of RL with transformers (transformer-based RL (TRL)) to explore the development trajectory and future trends of this field. We group the existing developments into two categories: architecture enhancements and trajectory optimizations, and examine the main applications of TRL in robotic manipulation, text-based games (TBGs), navigation, and autonomous driving. Architecture enhancement methods consider how to apply the powerful transformer structure to RL problems under the traditional RL framework, facilitating more precise modeling of agents and environments compared to traditional deep RL techniques. However, these methods are still limited by the inherent defects of traditional RL algorithms, such as bootstrapping and the "deadly triad". Trajectory optimization methods treat RL problems as sequence modeling problems and train a joint state-action model over entire trajectories under the behavior cloning framework; such approaches are able to extract policies from static datasets and fully use the long-sequence modeling capabilities of transformers. Given these advancements, the limitations and challenges in TRL are reviewed and proposals regarding future research directions are discussed. We hope that this survey can provide a detailed introduction to TRL and motivate future research in this rapidly developing field.

When Do Transformers Shine in RL? Decoupling Memory from Credit Assignment

Transformers are Meta-Reinforcement Learners

Transformer in Transformer As Backbone for Deep Reinforcement Learning

On Transforming Reinforcement Learning With Transformers: The Development Trajectory

On Transforming Reinforcement Learning by Transformer: The Development Trajectory

Recurrent Action Transformer with Memory

Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory

AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning

Transformer Memory for Interactive Visual Navigation in Cluttered Environments

Reinforcement Learning from Bagged Reward: A Transformer-based Approach for Instance-Level Reward Redistribution

Towards Long-delayed Sparsity: Learning a Better Transformer Through Reward Redistribution.

Rethinking Transformers in Solving POMDPs

Preference Transformer: Modeling Human Preferences using Transformers for RL

Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents

Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation

On the Long Range Abilities of Transformers

Decision Transformer: Reinforcement Learning via Sequence Modeling

Solving time-delay issues in reinforcement learning via transformers

AMAGO-2: Breaking the Multi-Task Barrier in Meta-Reinforcement Learning with Transformers

Rethinking Decision Transformer via Hierarchical Reinforcement Learning

Think Before You Act: Decision Transformers with Working Memory