Abstract:In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on. An aspirational goal is to construct self-improving robots: robots that can learn and improve on their own, from autonomous interaction with minimal human supervision or oversight. Such robots could collect and train on much larger datasets, and thus learn more robust and performant policies. While reinforcement learning offers a framework for such autonomous learning via trial-and-error, practical realizations end up requiring extensive human supervision for reward function design and repeated resetting of the environment between episodes of interactions. In this work, we propose MEDAL++, a novel design for self-improving robotic systems: given a small set of expert demonstrations at the start, the robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations. The policy and reward function are learned end-to-end from high-dimensional visual inputs, bypassing the need for explicit state estimation or task-specific pre-training for visual encoders used in prior work. We first evaluate our proposed algorithm on a simulated non-episodic benchmark EARL, finding that MEDAL++ is both more data efficient and gets up to 30% better final performance compared to state-of-the-art vision-based methods. Our real-robot experiments show that MEDAL++ can be applied to manipulation problems in larger environments than those considered in prior work, and autonomous self-improvement can improve the success rate by 30-70% over behavior cloning on just the expert data. Code, training and evaluation videos along with a brief overview is available at: <a class="link-external link-https" href="https://architsharma97.github.io/self-improving-robots/" rel="external noopener nofollow">this https URL</a>

Learning Robotic Skills Via Self-Imitation and Guide Reward

A data-efficient goal-directed deep reinforcement learning method for robot visuomotor skill

Exploration-efficient Deep Reinforcement Learning with Demonstration Guidance for Robot Control

Accelerating Self-Imitation Learning from Demonstrations via Policy Constraints and Q-Ensemble

Learning with Training Wheels: Speeding up Training with a Simple Controller for Deep Reinforcement Learning

Enhanced Generalization through Prioritization and Diversity in Self-Imitation Reinforcement Learning over Procedural Environments with Sparse Rewards

RLIF: Interactive Imitation Learning as Reinforcement Learning

Self-imitation guided goal-conditioned reinforcement learning

Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement Learning

SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning

Autonomous Learning and Navigation of Mobile Robots Based on Deep Reinforcement Learning

CRIL: Continual Robot Imitation Learning via Generative and Prediction Model

Affordance-Guided Reinforcement Learning via Visual Prompting

Relay Hindsight Experience Replay: Self-guided continual reinforcement learning for sequential object manipulation tasks with sparse rewards

Intrinsically Motivated Multi-Goal Reinforcement Learning Using Robotics Environment Integrated with OpenAI Gym

Learning Manipulation Skills through Robot Chain-of-Thought with Sparse Failure Guidance

GRI: General Reinforced Imitation and its Application to Vision-Based Autonomous Driving

On-Robot Reinforcement Learning with Goal-Contrastive Rewards

Improved Deep Reinforcement Learning with Expert Demonstrations for Urban Autonomous Driving

Imitation Bootstrapped Reinforcement Learning

Human skill knowledge guided global trajectory policy reinforcement learning method