Abstract:Identifying algorithms that flexibly and efficiently discover temporally-extended multi-phase plans is an essential step for the advancement of robotics and model-based reinforcement learning. The core problem of long-range planning is finding an efficient way to search through the tree of possible action sequences. Existing non-learned planning solutions from the Task and Motion Planning (TAMP) literature rely on the existence of logical descriptions for the effects and preconditions for actions. This constraint allows TAMP methods to efficiently reduce the tree search problem but limits their ability to generalize to unseen and complex physical environments. In contrast, deep reinforcement learning (DRL) methods use flexible neural-network-based function approximators to discover policies that generalize naturally to unseen circumstances. However, DRL methods struggle to handle the very sparse reward landscapes inherent to long-range multi-step planning situations. Here, we propose the Curious Sample Planner (CSP), which fuses elements of TAMP and DRL by combining a curiosity-guided sampling strategy with imitation learning to accelerate planning. We show that CSP can efficiently discover interesting and complex temporally-extended plans for solving a wide range of physically realistic 3D tasks. In contrast, standard planning and learning methods often fail to solve these tasks at all or do so only with a huge and highly variable number of training samples. We explore the use of a variety of curiosity metrics with CSP and analyze the types of solutions that CSP discovers. Finally, we show that CSP supports task transfer so that the exploration policies learned during experience with one task can help improve efficiency on related tasks.

SPIRE: Synergistic Planning, Imitation, and Reinforcement Learning for Long-Horizon Manipulation

Efficient Object Manipulation to an Arbitrary Goal Pose: Learning-based Anytime Prioritized Planning

Leveraging the Efficiency of Multi-Task Robot Manipulation Via Task-Evoked Planner and Reinforcement Learning

Jacta: A Versatile Planner for Learning Dexterous and Whole-body Manipulation

LEAGUE: Guided Skill Learning and Abstraction for Long-Horizon Manipulation

Learning to combine primitive skills: A step towards versatile robotic manipulation

Flexible and Efficient Long-Range Planning Through Curious Exploration

SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World

Tactile Active Inference Reinforcement Learning for Efficient Robotic Manipulation Skill Acquisition

PLANRL: A Motion Planning and Imitation Learning Framework to Bootstrap Reinforcement Learning

Learning of Long-Horizon Sparse-Reward Robotic Manipulator Tasks With Base Controllers

Guided Imitation of Task and Motion Planning

Safety Guaranteed Manipulation Based on Reinforcement Learning Planner and Model Predictive Control Actor

Explicit-Implicit Subgoal Planning for Long-Horizon Tasks with Sparse Reward

Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning

Predictive Multi-Agent-Based Planning and Landing Controller for Reactive Dual-Arm Manipulation

Efficient Learning of High Level Plans from Play

Learning to Imagine Manipulation Goals for Robot Task Planning

Collaborative motion planning for multi-manipulator systems through Reinforcement Learning and Dynamic Movement Primitives

Learning Robotic Manipulation through Visual Planning and Acting