Abstract:Although reinforcement learning methods offer a powerful framework for automatic skill acquisition, for practical learning-based control problems in domains such as robotics, imitation learning often provides a more convenient and accessible alternative. In particular, an interactive imitation learning method such as DAgger, which queries a near-optimal expert to intervene online to collect correction data for addressing the distributional shift challenges that afflict naïve behavioral cloning, can enjoy good performance both in theory and practice without requiring manually specified reward functions and other components of full reinforcement learning methods. In this paper, we explore how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning. Our proposed method uses reinforcement learning with user intervention signals themselves as rewards. This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert. We also provide a unified framework to analyze our RL method and DAgger; for which we present the asymptotic analysis of the suboptimal gap for both methods as well as the non-asymptotic sample complexity bound of our method. We then evaluate our method on challenging high-dimensional continuous control simulation benchmarks as well as real-world robotic vision-based manipulation tasks. The results show that it strongly outperforms DAgger-like approaches across the different tasks, especially when the intervening experts are suboptimal. Code and videos can be found on the project website: <a class="link-external link-https" href="https://rlif-page.github.io" rel="external noopener nofollow">this https URL</a>

RLlib: Abstractions for distributed reinforcement learning

Robot Simulation and Reinforcement Learning Training Platform Based on Distributed Architecture.

SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores

SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning

Scilab-RL: A software framework for efficient reinforcement learning and cognitive modeling research

RLtools: A Fast, Portable Deep Reinforcement Learning Library for Continuous Control

Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform

RLLTE: Long-Term Evolution Project of Reinforcement Learning

RLgraph: Modular Computation Graphs for Deep Reinforcement Learning

ChainerRL: A Deep Reinforcement Learning Library

Efficient Parallel Reinforcement Learning Framework using the Reactor Model

MSRL: Distributed Reinforcement Learning with Dataflow Fragments

Acme: A Research Framework for Distributed Reinforcement Learning

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

RLIF: Interactive Imitation Learning as Reinforcement Learning

CoRL: Environment Creation and Management Focused on System Integration

On the Foundation of Distributionally Robust Reinforcement Learning

InfraLib: Enabling Reinforcement Learning and Decision Making for Large Scale Infrastructure Management

LExCI: A Framework for Reinforcement Learning with Embedded Systems

Group Distributionally Robust Reinforcement Learning with Hierarchical Latent Variables

EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine