Abstract:In offline Imitation Learning (IL), one of the main challenges is the \textit{covariate shift} between the expert observations and the actual distribution encountered by the agent, because it is difficult to determine what action an agent should take when outside the state distribution of the expert demonstrations. Recently, the model-free solutions introduce the supplementary data and identify the latent expert-similar samples to augment the reliable samples during learning. Model-based solutions build forward dynamic models with conservatism quantification and then generate additional trajectories in the neighborhood of expert demonstrations. However, without reward supervision, these methods are often over-conservative in the out-of-expert-support regions, because only in states close to expert-observed states can there be a preferred action enabling policy optimization. To encourage more exploration on expert-unobserved states, we propose a novel model-based framework, called offline Imitation Learning with Self-paced Reverse Augmentation (SRA). Specifically, we build a reverse dynamic model from the offline demonstrations, which can efficiently generate trajectories leading to the expert-observed states in a self-paced style. Then, we use the subsequent reinforcement learning method to learn from the augmented trajectories and transit from expert-unobserved states to expert-observed states. This framework not only explores the expert-unobserved states but also guides maximizing long-term returns on these states, ultimately enabling generalization beyond the expert data. Empirical results show that our proposal could effectively mitigate the covariate shift and achieve the state-of-the-art performance on the offline imitation learning benchmarks. Project website: \url{<a class="link-external link-https" href="https://www.lamda.nju.edu.cn/shaojj/KDD24_SRA/" rel="external noopener nofollow">this https URL</a>}.

Uncertainty-Aware Data Augmentation for Offline Reinforcement Learning

DARA: Dynamics-Aware Reward Augmentation in Offline Reinforcement Learning

Uncertainty-aware Distributional Offline Reinforcement Learning

Uncertainty-driven Trajectory Truncation for Data Augmentation in Offline Reinforcement Learning

UAC: Offline Reinforcement Learning with Uncertain Action Constraint

Exploiting Generalization in Offline Reinforcement Learning via Unseen State Augmentations

GTA: Generative Trajectory Augmentation with Guidance for Offline Reinforcement Learning

Augmenting Offline RL with Unlabeled Data

Guided Data Augmentation for Offline Reinforcement Learning and Imitation Learning

DARL: Distance-Aware Uncertainty Estimation for Offline Reinforcement Learning.

A Simple Unified Uncertainty-Guided Framework for Offline-to-Online Reinforcement Learning

Robust Offline Reinforcement Learning from Low-Quality Data

Augmenting Offline Reinforcement Learning with State-only Interactions

Offline Robot Reinforcement Learning with Uncertainty-Guided Human Expert Sampling

Diverse Randomized Value Functions: A Provably Pessimistic Approach for Offline Reinforcement Learning

Q-Distribution guided Q-learning for offline reinforcement learning: Uncertainty penalized Q-value via consistency model

Small Dataset, Big Gains: Enhancing Reinforcement Learning by Offline Pre-Training with Model Based Augmentation

Adaptive Policy Learning for Offline-to-Online Reinforcement Learning

Offline Imitation Learning with Model-based Reverse Augmentation

AdaAugment: A Tuning-Free and Adaptive Approach to Enhance Data Augmentation

Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching