Abstract:Although reinforcement learning methods offer a powerful framework for automatic skill acquisition, for practical learning-based control problems in domains such as robotics, imitation learning often provides a more convenient and accessible alternative. In particular, an interactive imitation learning method such as DAgger, which queries a near-optimal expert to intervene online to collect correction data for addressing the distributional shift challenges that afflict naïve behavioral cloning, can enjoy good performance both in theory and practice without requiring manually specified reward functions and other components of full reinforcement learning methods. In this paper, we explore how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning. Our proposed method uses reinforcement learning with user intervention signals themselves as rewards. This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert. We also provide a unified framework to analyze our RL method and DAgger; for which we present the asymptotic analysis of the suboptimal gap for both methods as well as the non-asymptotic sample complexity bound of our method. We then evaluate our method on challenging high-dimensional continuous control simulation benchmarks as well as real-world robotic vision-based manipulation tasks. The results show that it strongly outperforms DAgger-like approaches across the different tasks, especially when the intervening experts are suboptimal. Code and videos can be found on the project website: <a class="link-external link-https" href="https://rlif-page.github.io" rel="external noopener nofollow">this https URL</a>

Boosting Feedback Efficiency of Interactive Reinforcement Learning by Adaptive Learning from Scores

Hybrid Reinforcement Learning Based on Human Preference and Advice for Efficient Robot Skill Learning

RLIF: Interactive Imitation Learning as Reinforcement Learning

A Review on Interactive Reinforcement Learning from Human Social Feedback

Learning Preferences for Interactive Autonomy

Offline Reward Shaping with Scaling Human Preference Feedback for Deep Reinforcement Learning

Adaptive Preference Scaling for Reinforcement Learning with Human Feedback

Social Interaction for Efficient Agent Learning from Human Reward.

Human Social Feedback for Efficient Interactive Reinforcement Agent Learning

Multi-trainer binary feedback interactive reinforcement learning

Interactive Learning from Policy-Dependent Human Feedback

PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training

Actor-Critic Reinforcement Learning with Simultaneous Human Control and Feedback

From "Thumbs Up" to "10 out of 10": Reconsidering Scalar Feedback in Interactive Reinforcement Learning

Learning from Human Reward Benefits from Socio-Competitive Feedback

Training Robots to Evaluate Robots: Example-Based Interactive Reward Functions for Policy Learning

Reinforcement Learning based Embodied Agents Modelling Human Users Through Interaction and Multi-Sensory Perception

Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing

FRESH: Interactive Reward Shaping in High-Dimensional State Spaces using Human Feedback

i-Sim2Real: Reinforcement Learning of Robotic Policies in Tight Human-Robot Interaction Loops

Weak Human Preference Supervision for Deep Reinforcement Learning