Abstract:Ideally, we would place a robot in a real-world environment and leave it there improving on its own by gathering more experience autonomously. However, algorithms for autonomous robotic learning have been challenging to realize in the real world. While this has often been attributed to the challenge of sample complexity, even sample-efficient techniques are hampered by two major challenges - the difficulty of providing well "shaped" rewards, and the difficulty of continual reset-free training. In this work, we describe a system for real-world reinforcement learning that enables agents to show continual improvement by training directly in the real world without requiring painstaking effort to hand-design reward functions or reset mechanisms. Our system leverages occasional non-expert human-in-the-loop feedback from remote users to learn informative distance functions to guide exploration while leveraging a simple self-supervised learning algorithm for goal-directed policy learning. We show that in the absence of resets, it is particularly important to account for the current "reachability" of the exploration policy when deciding which regions of the space to explore. Based on this insight, we instantiate a practical learning system - GEAR, which enables robots to simply be placed in real-world environments and left to train autonomously without interruption. The system streams robot experience to a web interface only requiring occasional asynchronous feedback from remote, crowdsourced, non-expert humans in the form of binary comparative feedback. We evaluate this system on a suite of robotic tasks in simulation and demonstrate its effectiveness at learning behaviors both in simulation and the real world. Project website <a class="link-external link-https" href="https://guided-exploration-autonomous-rl.github.io/GEAR/" rel="external noopener nofollow">this https URL</a>.

Accelerated Robot Learning via Human Brain Signals

Accelerating Reinforcement Learning Agent with EEG-based Implicit Human Feedback

Accelerating Reinforcement Learning using EEG-based implicit human feedback

Learning with Training Wheels: Speeding up Training with a Simple Controller for Deep Reinforcement Learning

Bootstrapping Adaptive Human-Machine Interfaces with Offline Reinforcement Learning

Deep Reinforcement Learning from Error-Related Potentials Via an EEG-based Brain-Computer Interface

Hybrid Reinforcement Learning Based on Human Preference and Advice for Efficient Robot Skill Learning

Combining brain-computer interfaces with deep reinforcement learning for robot training: a feasibility study in a simulation environment

End-to-End Robotic Reinforcement Learning without Reward Engineering

Autonomous Robotic Reinforcement Learning with Asynchronous Human Feedback

Efficient Hindsight Reinforcement Learning Using Demonstrations for Robotic Tasks with Sparse Rewards

Achieving Sample-Efficient Learning of Long-Horizon Sparse-Reward Robotic Tasks with Base Controllers

A Study on Dense and Sparse (Visual) Rewards in Robot Policy Learning

Data-efficient Deep Reinforcement Learning Method Toward Scaling Continuous Robotic Task with Sparse Rewards.

Maximizing Alignment with Minimal Feedback: Efficiently Learning Rewards for Visuomotor Robot Policy Alignment

Learning Sparse Control Tasks from Pixels by Latent Nearest-Neighbor-Guided Explorations

Real-World Human-Robot Collaborative Reinforcement Learning

Addressing Reward Engineering For Deep Reinforcement Learning On Multi-Stage Task

Learning of Long-Horizon Sparse-Reward Robotic Manipulator Tasks With Base Controllers

ASHA: Assistive Teleoperation via Human-in-the-Loop Reinforcement Learning

Deep Reinforcement Learning for an Anthropomorphic Robotic Arm under Sparse Reward Tasks