Abstract:Although reinforcement learning methods offer a powerful framework for automatic skill acquisition, for practical learning-based control problems in domains such as robotics, imitation learning often provides a more convenient and accessible alternative. In particular, an interactive imitation learning method such as DAgger, which queries a near-optimal expert to intervene online to collect correction data for addressing the distributional shift challenges that afflict naïve behavioral cloning, can enjoy good performance both in theory and practice without requiring manually specified reward functions and other components of full reinforcement learning methods. In this paper, we explore how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning. Our proposed method uses reinforcement learning with user intervention signals themselves as rewards. This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert. We also provide a unified framework to analyze our RL method and DAgger; for which we present the asymptotic analysis of the suboptimal gap for both methods as well as the non-asymptotic sample complexity bound of our method. We then evaluate our method on challenging high-dimensional continuous control simulation benchmarks as well as real-world robotic vision-based manipulation tasks. The results show that it strongly outperforms DAgger-like approaches across the different tasks, especially when the intervening experts are suboptimal. Code and videos can be found on the project website: <a class="link-external link-https" href="https://rlif-page.github.io" rel="external noopener nofollow">this https URL</a>

Task-Agnostic Learning to Accomplish New Tasks

Leveraging the Efficiency of Multi-Task Robot Manipulation Via Task-Evoked Planner and Reinforcement Learning

Latent Plans for Task-Agnostic Offline Reinforcement Learning

Learning to combine primitive skills: A step towards versatile robotic manipulation

Task-Oriented Self-Imitation Learning for Robotic Autonomous Skill Acquisition

LEAGUE: Guided Skill Learning and Abstraction for Long-Horizon Manipulation

Effective Offline Robot Learning with Structured Task Graph

Learning from Guided Play: Improving Exploration for Adversarial Imitation Learning with Simple Auxiliary Tasks

A Task Learning Mechanism for the Telerobots

LTL-Transfer: Skill Transfer for Temporal Task Specification

Robotic Search & Rescue via Online Multi-task Reinforcement Learning

Accelerating Reinforcement Learning for Autonomous Driving using Task-Agnostic and Ego-Centric Motion Skills

Reinforcement Learning with Adaptive Policy Gradient Transfer Across Heterogeneous Problems

Active Task Randomization: Learning Robust Skills via Unsupervised Generation of Diverse and Feasible Tasks

RLIF: Interactive Imitation Learning as Reinforcement Learning

Autonomous learning of multiple, context-dependent tasks

Task-Oriented Deep Reinforcement Learning for Robotic Skill Acquisition and Control

Interesting Object, Curious Agent: Learning Task-Agnostic Exploration

State-Dependent Maximum Entropy Reinforcement Learning for Robot Long-Horizon Task Learning

Autonomous Open-Ended Learning of Tasks with Non-Stationary Interdependencies

Active Learning of Abstract Plan Feasibility