Abstract:Interacting with the actual environment to acquire data is often costly and time-consuming in robotic tasks. Model-based offline reinforcement learning (RL) provides a feasible solution. On the one hand, it eliminates the requirements of interaction with the actual environment. On the other hand, it learns the transition dynamics and reward function from the offline datasets and generates simulated rollouts to accelerate training. Previous model-based offline RL methods adopt probabilistic ensemble neural networks (NN) to model aleatoric uncertainty and epistemic uncertainty. However, this results in an exponential increase in training time and computing resource requirements. Furthermore, these methods are easily disturbed by the accumulative errors of the environment dynamics models when simulating long-term rollouts. To solve the above problems, we propose an uncertainty-aware sequence modeling architecture called Environment Transformer. It models the probability distribution of the environment dynamics and reward function to capture aleatoric uncertainty and treats epistemic uncertainty as a learnable noise parameter. Benefiting from the accurate modeling of the transition dynamics and reward function, Environment Transformer can be combined with arbitrary planning, dynamics programming, or policy optimization algorithms for offline RL. In this case, we perform Conservative Q-Learning (CQL) to learn a conservative Q-function. Through simulation experiments, we demonstrate that our method achieves or exceeds state-of-the-art performance in widely studied offline RL benchmarks. Moreover, we show that Environment Transformer's simulated rollout quality, sample efficiency, and long-term rollout simulation capability are superior to those of previous model-based offline RL methods.

Q-value Regularized Transformer for Offline Reinforcement Learning

Q-value Regularized Decision ConvFormer for Offline Reinforcement Learning

Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

Critic-Guided Decision Transformer for Offline Reinforcement Learning

QT-TDM: Planning with Transformer Dynamics Model and Autoregressive Q-Learning

Stabilizing Transformer-Based Action Sequence Generation For Q-Learning

Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL

Environment Transformer and Policy Optimization for Model-Based Offline Reinforcement Learning

Offline Trajectory Generalization for Offline Reinforcement Learning

Constrained Decision Transformer for Offline Safe Reinforcement Learning

Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning

Adaptive $Q$-Aid for Conditional Supervised Learning in Offline Reinforcement Learning

Temporal Logic Specification-Conditioned Decision Transformer for Offline Safe Reinforcement Learning

Strategically Conservative Q-Learning

On Transforming Reinforcement Learning by Transformer: The Development Trajectory

Predictive Coding for Decision Transformer

On Transforming Reinforcement Learning With Transformers: The Development Trajectory

Bootstrapped Transformer for Offline Reinforcement Learning

Action Q-Transformer: Visual Explanation in Deep Reinforcement Learning with Encoder-Decoder Model using Action Query

Adaptive pessimism via target Q-value for offline reinforcement learning

Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning