Abstract:Transformers, originally devised for natural language processing (NLP), have also produced significant successes in computer vision (CV). Due to their strong expression power, researchers are investigating ways to deploy transformers for reinforcement learning (RL), and transformer-based models have manifested their potential in representative RL benchmarks. In this paper, we collect and dissect recent advances concerning the transformation of RL with transformers (transformer-based RL (TRL)) to explore the development trajectory and future trends of this field. We group the existing developments into two categories: architecture enhancements and trajectory optimizations, and examine the main applications of TRL in robotic manipulation, text-based games (TBGs), navigation, and autonomous driving. Architecture enhancement methods consider how to apply the powerful transformer structure to RL problems under the traditional RL framework, facilitating more precise modeling of agents and environments compared to traditional deep RL techniques. However, these methods are still limited by the inherent defects of traditional RL algorithms, such as bootstrapping and the "deadly triad". Trajectory optimization methods treat RL problems as sequence modeling problems and train a joint state-action model over entire trajectories under the behavior cloning framework; such approaches are able to extract policies from static datasets and fully use the long-sequence modeling capabilities of transformers. Given these advancements, the limitations and challenges in TRL are reviewed and proposals regarding future research directions are discussed. We hope that this survey can provide a detailed introduction to TRL and motivate future research in this rapidly developing field.

Combining Reinforcement Learning and Tensor Networks, with an Application to Dynamical Large Deviations

Combining Reinforcement Learning and Tensor Networks, with an Application to Dynamical Large Deviations

A Tensor Network Implementation of Multi Agent Reinforcement Learning

Federated Natural Policy Gradient and Actor Critic Methods for Multi-task Reinforcement Learning

Keep Various Trajectories: Promoting Exploration of Ensemble Policies in Continuous Control

Overcoming Delayed Feedback in Reinforcement Learning Using Actor Ensembles

Mitigating Estimation Errors by Twin TD-Regularized Actor and Critic for Deep Reinforcement Learning

Scalable Reinforcement Learning for Multi-Agent Networked Systems

Efficient Reinforcement Learning in Continuous State and Action Spaces with Dyna and Policy Approximation.

Reinforcement Leaning for Infinite-Dimensional Systems

Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation

Natural Policy Gradient and Actor Critic Methods for Constrained Multi-Task Reinforcement Learning

Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation

Adaptive Learning of Tensor Network Structures

Is Model Ensemble Necessary? Model-based RL via a Single Model with Lipschitz Regularized Value Function

Staged Reinforcement Learning for Complex Tasks through Decomposed Environments

On Transforming Reinforcement Learning With Transformers: The Development Trajectory

Stabilizing Visual Reinforcement Learning Via Asymmetric Interactive Cooperation

Off-Policy Actor-Critic in an Ensemble: Achieving Maximum General Entropy and Effective Environment Exploration in Deep Reinforcement Learning

Stable and Safe Human-aligned Reinforcement Learning through Neural Ordinary Differential Equations

Towards Applicable Reinforcement Learning: Improving the Generalization and Sample Efficiency with Policy Ensemble.