Abstract:Although reinforcement learning methods offer a powerful framework for automatic skill acquisition, for practical learning-based control problems in domains such as robotics, imitation learning often provides a more convenient and accessible alternative. In particular, an interactive imitation learning method such as DAgger, which queries a near-optimal expert to intervene online to collect correction data for addressing the distributional shift challenges that afflict naïve behavioral cloning, can enjoy good performance both in theory and practice without requiring manually specified reward functions and other components of full reinforcement learning methods. In this paper, we explore how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning. Our proposed method uses reinforcement learning with user intervention signals themselves as rewards. This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert. We also provide a unified framework to analyze our RL method and DAgger; for which we present the asymptotic analysis of the suboptimal gap for both methods as well as the non-asymptotic sample complexity bound of our method. We then evaluate our method on challenging high-dimensional continuous control simulation benchmarks as well as real-world robotic vision-based manipulation tasks. The results show that it strongly outperforms DAgger-like approaches across the different tasks, especially when the intervening experts are suboptimal. Code and videos can be found on the project website: <a class="link-external link-https" href="https://rlif-page.github.io" rel="external noopener nofollow">this https URL</a>

Integrating human learning and reinforcement learning: A novel approach to agent training

HiER: Highlight Experience Replay for Boosting Off-Policy Reinforcement Learning Agents

Learning and reusing primitive behaviours to improve Hindsight Experience Replay sample efficiency

Leveraging Efficiency Through Hybrid Prioritized Experience Replay in Door Environment.

RLIF: Interactive Imitation Learning as Reinforcement Learning

Relabeling and policy distillation of hierarchical reinforcement learning

ReLIC: A Recipe for 64k Steps of In-Context Reinforcement Learning for Embodied AI

SHIRE: Enhancing Sample Efficiency using Human Intuition in REinforcement Learning

A Versatile Agent for Fast Learning from Human Instructors

Reinforcement Learning From Imperfect Corrective Actions And Proxy Rewards

PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training

Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress

Re-attentive experience replay in off-policy reinforcement learning

Active Hierarchical Imitation and Reinforcement Learning

Policy Rehearsing: Training Generalizable Policies for Reinforcement Learning

Towards Sample-Efficiency and Generalization of Transfer and Inverse Reinforcement Learning: A Comprehensive Literature Review

Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own

Efficient Diversity-based Experience Replay for Deep Reinforcement Learning

Sample Efficient Reinforcement Learning Using Graph-Based Memory Reconstruction.

FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning

Efficient Deep Reinforcement Learning Via Adaptive Policy Transfer