Abstract:Transformers, originally devised for natural language processing (NLP), have also produced significant successes in computer vision (CV). Due to their strong expression power, researchers are investigating ways to deploy transformers for reinforcement learning (RL), and transformer-based models have manifested their potential in representative RL benchmarks. In this paper, we collect and dissect recent advances concerning the transformation of RL with transformers (transformer-based RL (TRL)) to explore the development trajectory and future trends of this field. We group the existing developments into two categories: architecture enhancements and trajectory optimizations, and examine the main applications of TRL in robotic manipulation, text-based games (TBGs), navigation, and autonomous driving. Architecture enhancement methods consider how to apply the powerful transformer structure to RL problems under the traditional RL framework, facilitating more precise modeling of agents and environments compared to traditional deep RL techniques. However, these methods are still limited by the inherent defects of traditional RL algorithms, such as bootstrapping and the "deadly triad". Trajectory optimization methods treat RL problems as sequence modeling problems and train a joint state-action model over entire trajectories under the behavior cloning framework; such approaches are able to extract policies from static datasets and fully use the long-sequence modeling capabilities of transformers. Given these advancements, the limitations and challenges in TRL are reviewed and proposals regarding future research directions are discussed. We hope that this survey can provide a detailed introduction to TRL and motivate future research in this rapidly developing field.

Exploiting Transformer in Sparse Reward Reinforcement Learning for Interpretable Temporal Logic Motion Planning

Reinforcement learning under temporal logic constraints as a sequence modelling problem

Reinforcement learning under temporal logic constraints as a sequence modeling problem

Mission-driven Exploration for Accelerated Deep Reinforcement Learning with Temporal Logic Task Specifications

Constrained Reinforcement Learning for Vehicle Motion Planning with Topological Reachability Analysis

Latent Plan Transformer for Trajectory Abstraction: Planning as Latent Space Inference

QT-TDM: Planning with Transformer Dynamics Model and Autoregressive Q-Learning

Directed Exploration in Reinforcement Learning from Linear Temporal Logic

On Transforming Reinforcement Learning by Transformer: The Development Trajectory

Logically Constrained Robotics Transformers for Enhanced Perception-Action Planning

Deep Reinforcement Learning with Temporal Logics

Motion Planning for Mobile Robots with Temporal Logic Specifications

On Transforming Reinforcement Learning With Transformers: The Development Trajectory

CoT-TL: Low-Resource Temporal Knowledge Representation of Planning Instructions Using Chain-of-Thought Reasoning

Physics-based Motion Planning with Temporal Logic Specifications

Overcoming Exploration: Deep Reinforcement Learning for Continuous Control in Cluttered Environments from Temporal Logic Specifications

End-to-End Path Planning Under Linear Temporal Logic Specifications

Model-free reinforcement learning for motion planning of autonomous agents with complex tasks in partially observable environments

A Framework for Following Temporal Logic Instructions with Unknown Causal Dependencies

Deep Learning-Based Path Planning Under Co-Safe Temporal Logic Specifications

Nl2Hltl2Plan: Scaling Up Natural Language Understanding for Multi-Robots Through Hierarchical Temporal Logic Task Representation