Abstract:Although reinforcement learning methods offer a powerful framework for automatic skill acquisition, for practical learning-based control problems in domains such as robotics, imitation learning often provides a more convenient and accessible alternative. In particular, an interactive imitation learning method such as DAgger, which queries a near-optimal expert to intervene online to collect correction data for addressing the distributional shift challenges that afflict naïve behavioral cloning, can enjoy good performance both in theory and practice without requiring manually specified reward functions and other components of full reinforcement learning methods. In this paper, we explore how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning. Our proposed method uses reinforcement learning with user intervention signals themselves as rewards. This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert. We also provide a unified framework to analyze our RL method and DAgger; for which we present the asymptotic analysis of the suboptimal gap for both methods as well as the non-asymptotic sample complexity bound of our method. We then evaluate our method on challenging high-dimensional continuous control simulation benchmarks as well as real-world robotic vision-based manipulation tasks. The results show that it strongly outperforms DAgger-like approaches across the different tasks, especially when the intervening experts are suboptimal. Code and videos can be found on the project website: <a class="link-external link-https" href="https://rlif-page.github.io" rel="external noopener nofollow">this https URL</a>

Data-Efficient Reinforcement Learning Using Active Exploration Method.

Model-Based Robot Learning Control with Uncertainty Directed Exploration

Accelerating Reinforcement Learning with Local Data Enhancement for Process Control

Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward Prediction

Learning with Training Wheels: Speeding up Training with a Simple Controller for Deep Reinforcement Learning

Improving PILCO with Bayesian Neural Network Dynamics Models

Enhanced Probabilistic Inference Algorithm Using Probabilistic Neural Networks For Learning Control

Non-local Policy Optimization via Diversity-regularized Collaborative Exploration

Efficient Reinforcement Learning via Decoupling Exploration and Utilization

RL-Driven MPPI: Accelerating Online Control Laws Calculation with Offline Policy

PP-PG: Combining Parameter Perturbation with Policy Gradient Methods for Effective and Efficient Explorations in Deep Reinforcement Learning

Efficiently Training On-Policy Actor-Critic Networks in Robotic Deep Reinforcement Learning with Demonstration-like Sampled Exploration

COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RL

Accelerating Self-Imitation Learning from Demonstrations via Policy Constraints and Q-Ensemble

Training Efficient Controllers via Analytic Policy Gradient

Blending Imitation and Reinforcement Learning for Robust Policy Improvement

Exploration-efficient Deep Reinforcement Learning with Demonstration Guidance for Robot Control

Efficient Exploration Using Extra Safety Budget in Constrained Policy Optimization

RLIF: Interactive Imitation Learning as Reinforcement Learning

Enhancing Sample Efficiency and Exploration in Reinforcement Learning through the Integration of Diffusion Models and Proximal Policy Optimization

Mixed Reinforcement Learning for Efficient Policy Optimization in Stochastic Environments