Abstract:Ideally, we would place a robot in a real-world environment and leave it there improving on its own by gathering more experience autonomously. However, algorithms for autonomous robotic learning have been challenging to realize in the real world. While this has often been attributed to the challenge of sample complexity, even sample-efficient techniques are hampered by two major challenges - the difficulty of providing well "shaped" rewards, and the difficulty of continual reset-free training. In this work, we describe a system for real-world reinforcement learning that enables agents to show continual improvement by training directly in the real world without requiring painstaking effort to hand-design reward functions or reset mechanisms. Our system leverages occasional non-expert human-in-the-loop feedback from remote users to learn informative distance functions to guide exploration while leveraging a simple self-supervised learning algorithm for goal-directed policy learning. We show that in the absence of resets, it is particularly important to account for the current "reachability" of the exploration policy when deciding which regions of the space to explore. Based on this insight, we instantiate a practical learning system - GEAR, which enables robots to simply be placed in real-world environments and left to train autonomously without interruption. The system streams robot experience to a web interface only requiring occasional asynchronous feedback from remote, crowdsourced, non-expert humans in the form of binary comparative feedback. We evaluate this system on a suite of robotic tasks in simulation and demonstrate its effectiveness at learning behaviors both in simulation and the real world. Project website <a class="link-external link-https" href="https://guided-exploration-autonomous-rl.github.io/GEAR/" rel="external noopener nofollow">this https URL</a>.

An unsupervised autonomous learning framework for goal-directed behaviours in dynamic contexts

Cooperative multi-agent target searching: a deep reinforcement learning approach based on parallel hindsight experience replay

Autonomous Open-Ended Learning of Tasks with Non-Stationary Interdependencies

Self-Adapting Goals Allow Transfer of Predictive Models to New Tasks

Unsupervised Control Through Non-Parametric Discriminative Rewards

Multigoal Visual Navigation With Collision Avoidance via Deep Reinforcement Learning

Unsupervised Learning of Goal Spaces for Intrinsically Motivated Goal Exploration

Self-Learning Robot Autonomous Navigation with Deep Reinforcement Learning Techniques

Autonomous learning of multiple, context-dependent tasks

TLDR: Unsupervised Goal-Conditioned RL via Temporal Distance-Aware Representations

A Goal-Conditioned Reinforcement Learning Algorithm with Environment Modeling

Multi-USV Dynamic Navigation and Target Capture: A Guided Multi-Agent Reinforcement Learning Approach

Life, uh, Finds a Way: Systematic Neural Search

Adaptive goal selection for agents in dynamic environments

Emergent Solutions to High-Dimensional Multitask Reinforcement Learning

Task-Oriented Self-Imitation Learning for Robotic Autonomous Skill Acquisition

An on-line algorithm for dynamic reinforcement learning and planning in reactive environments

Evolving hierarchical memory-prediction machines in multi-task reinforcement learning

Autonomous Robotic Reinforcement Learning with Asynchronous Human Feedback

Active Learning of Inverse Models with Intrinsically Motivated Goal Exploration in Robots

ODGR: Online Dynamic Goal Recognition