Abstract:In offline Imitation Learning (IL), one of the main challenges is the \textit{covariate shift} between the expert observations and the actual distribution encountered by the agent, because it is difficult to determine what action an agent should take when outside the state distribution of the expert demonstrations. Recently, the model-free solutions introduce the supplementary data and identify the latent expert-similar samples to augment the reliable samples during learning. Model-based solutions build forward dynamic models with conservatism quantification and then generate additional trajectories in the neighborhood of expert demonstrations. However, without reward supervision, these methods are often over-conservative in the out-of-expert-support regions, because only in states close to expert-observed states can there be a preferred action enabling policy optimization. To encourage more exploration on expert-unobserved states, we propose a novel model-based framework, called offline Imitation Learning with Self-paced Reverse Augmentation (SRA). Specifically, we build a reverse dynamic model from the offline demonstrations, which can efficiently generate trajectories leading to the expert-observed states in a self-paced style. Then, we use the subsequent reinforcement learning method to learn from the augmented trajectories and transit from expert-unobserved states to expert-observed states. This framework not only explores the expert-unobserved states but also guides maximizing long-term returns on these states, ultimately enabling generalization beyond the expert data. Empirical results show that our proposal could effectively mitigate the covariate shift and achieve the state-of-the-art performance on the offline imitation learning benchmarks. Project website: \url{<a class="link-external link-https" href="https://www.lamda.nju.edu.cn/shaojj/KDD24_SRA/" rel="external noopener nofollow">this https URL</a>}.

Online support vector regression for reinforcement learning

On-line support vector regression with multiple samples

An Online Support Vector Machine for the Open-Ended Environment.

Local least squares support vector regression with application to online modeling for batch processes

Supported Value Regularization for Offline Reinforcement Learning

Real-Time Online Goal Recognition in Continuous Domains via Deep Reinforcement Learning

Value-Consistent Representation Learning for Data-Efficient Reinforcement Learning

Value Penalized Q-Learning for Recommender Systems

The Research and Application of a Learning Algorithm of Batch Increment and Online Which Bases on Support Vector Regression

Handling Varied Objectives by Online Decision Making

Verti-Selector: Automatic Curriculum Learning for Wheeled Mobility on Vertically Challenging Terrain

Weighted On-line Support Vector Regression

Offline Reinforcement Learning With Behavior Value Regularization

Reinforcement Learning Meets Visual Odometry

VOCE: Variational Optimization with Conservative Estimation for Offline Safe Reinforcement Learning.

An Empirical Investigation of Value-Based Multi-objective Reinforcement Learning for Stochastic Environments

A novel online incremental and decremental learning algorithm based on variable support vector machine

Real-World Offline Reinforcement Learning from Vision Language Model Feedback

Towards Off-Policy Reinforcement Learning for Ranking Policies with Human Feedback

Discrete Space Reinforcement Learning Algorithm Based on Support Vector Machine Classification

Offline Imitation Learning with Model-based Reverse Augmentation