Abstract:In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on. An aspirational goal is to construct self-improving robots: robots that can learn and improve on their own, from autonomous interaction with minimal human supervision or oversight. Such robots could collect and train on much larger datasets, and thus learn more robust and performant policies. While reinforcement learning offers a framework for such autonomous learning via trial-and-error, practical realizations end up requiring extensive human supervision for reward function design and repeated resetting of the environment between episodes of interactions. In this work, we propose MEDAL++, a novel design for self-improving robotic systems: given a small set of expert demonstrations at the start, the robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations. The policy and reward function are learned end-to-end from high-dimensional visual inputs, bypassing the need for explicit state estimation or task-specific pre-training for visual encoders used in prior work. We first evaluate our proposed algorithm on a simulated non-episodic benchmark EARL, finding that MEDAL++ is both more data efficient and gets up to 30% better final performance compared to state-of-the-art vision-based methods. Our real-robot experiments show that MEDAL++ can be applied to manipulation problems in larger environments than those considered in prior work, and autonomous self-improvement can improve the success rate by 30-70% over behavior cloning on just the expert data. Code, training and evaluation videos along with a brief overview is available at: <a class="link-external link-https" href="https://architsharma97.github.io/self-improving-robots/" rel="external noopener nofollow">this https URL</a>

Intrinsically Motivated Multi-Goal Reinforcement Learning Using Robotics Environment Integrated with OpenAI Gym

Relay Hindsight Experience Replay: Self-guided continual reinforcement learning for sequential object manipulation tasks with sparse rewards

An Open-Source Multi-goal Reinforcement Learning Environment for Robotic Manipulation with Pybullet

Solving Robotic Manipulation With Sparse Reward Reinforcement Learning Via Graph-Based Diversity and Proximity

Deep Reinforcement Learning for Complex Manipulation Tasks with Sparse Feedback

Autonomous Robotic Reinforcement Learning with Asynchronous Human Feedback

Hierarchical reinforcement learning for handling sparse rewards in multi-goal navigation

Learning of Long-Horizon Sparse-Reward Robotic Manipulator Tasks With Base Controllers

Research on Complex Robot Manipulation Tasks Based on Hindsight Trust Region Policy Optimization

Contact Energy Based Hindsight Experience Prioritization

ACDER: Augmented Curiosity-Driven Experience Replay

Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning

Quantile Regression Hindsight Experience Replay

Combining Hindsight with Goal-enhanced Prediction for Multi-goal Reinforcement Learning

Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement Learning

The Ingredients of Real-World Robotic Reinforcement Learning

On-Robot Reinforcement Learning with Goal-Contrastive Rewards

Task-Oriented Deep Reinforcement Learning for Robotic Skill Acquisition and Control

Prioritized Hindsight with Dual Buffer for Meta-Reinforcement Learning

Robot Air Hockey: A Manipulation Testbed for Robot Learning with Reinforcement Learning

Enhancing Robotic Manipulation: Harnessing the Power of Multi-Task Reinforcement Learning and Single Life Reinforcement Learning in Meta-World