Abstract:Although reinforcement learning methods offer a powerful framework for automatic skill acquisition, for practical learning-based control problems in domains such as robotics, imitation learning often provides a more convenient and accessible alternative. In particular, an interactive imitation learning method such as DAgger, which queries a near-optimal expert to intervene online to collect correction data for addressing the distributional shift challenges that afflict naïve behavioral cloning, can enjoy good performance both in theory and practice without requiring manually specified reward functions and other components of full reinforcement learning methods. In this paper, we explore how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning. Our proposed method uses reinforcement learning with user intervention signals themselves as rewards. This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert. We also provide a unified framework to analyze our RL method and DAgger; for which we present the asymptotic analysis of the suboptimal gap for both methods as well as the non-asymptotic sample complexity bound of our method. We then evaluate our method on challenging high-dimensional continuous control simulation benchmarks as well as real-world robotic vision-based manipulation tasks. The results show that it strongly outperforms DAgger-like approaches across the different tasks, especially when the intervening experts are suboptimal. Code and videos can be found on the project website: <a class="link-external link-https" href="https://rlif-page.github.io" rel="external noopener nofollow">this https URL</a>

Diffskill: Improving Reinforcement Learning Through Diffusion-Based Skill Denoiser for Robotic Manipulation

DiffSkill: Skill Abstraction from Differentiable Physics for Deformable Object Manipulations with Tools

Robust Policy Learning via Offline Skill Diffusion

SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution

Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts

Diff-Transfer: Model-based Robotic Manipulation Skill Transfer via Differentiable Physics Simulation

Unsupervised Reinforcement Learning for Transferable Manipulation Skill Discovery

Skill-Critic: Refining Learned Skills for Hierarchical Reinforcement Learning

RLIF: Interactive Imitation Learning as Reinforcement Learning

A data-efficient goal-directed deep reinforcement learning method for robot visuomotor skill

Continual Diffuser (CoD): Mastering Continual Offline Reinforcement Learning with Experience Rehearsal

Skill Enhancement Learning with Knowledge Distillation

Skill Preferences: Learning to Extract and Execute Robotic Skills from Human Feedback

PlayFusion: Skill Acquisition via Diffusion from Language-Annotated Play

A Robotic Skill Learning System Built Upon Diffusion Policies and Foundation Models

Training Diffusion Models with Reinforcement Learning

Tactile Active Inference Reinforcement Learning for Efficient Robotic Manipulation Skill Acquisition

SLIM: Skill Learning with Multiple Critics

Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning

SkillTree: Explainable Skill-Based Deep Reinforcement Learning for Long-Horizon Control Tasks

Unpacking the Individual Components of Diffusion Policy