Abstract:Identifying algorithms that flexibly and efficiently discover temporally-extended multi-phase plans is an essential step for the advancement of robotics and model-based reinforcement learning. The core problem of long-range planning is finding an efficient way to search through the tree of possible action sequences. Existing non-learned planning solutions from the Task and Motion Planning (TAMP) literature rely on the existence of logical descriptions for the effects and preconditions for actions. This constraint allows TAMP methods to efficiently reduce the tree search problem but limits their ability to generalize to unseen and complex physical environments. In contrast, deep reinforcement learning (DRL) methods use flexible neural-network-based function approximators to discover policies that generalize naturally to unseen circumstances. However, DRL methods struggle to handle the very sparse reward landscapes inherent to long-range multi-step planning situations. Here, we propose the Curious Sample Planner (CSP), which fuses elements of TAMP and DRL by combining a curiosity-guided sampling strategy with imitation learning to accelerate planning. We show that CSP can efficiently discover interesting and complex temporally-extended plans for solving a wide range of physically realistic 3D tasks. In contrast, standard planning and learning methods often fail to solve these tasks at all or do so only with a huge and highly variable number of training samples. We explore the use of a variety of curiosity metrics with CSP and analyze the types of solutions that CSP discovers. Finally, we show that CSP supports task transfer so that the exploration policies learned during experience with one task can help improve efficiency on related tasks.

RTP-Q: a Reinforcement Learning System with an Active Exploration Planning Structure for Enhancing the Convergence Rate

Efficient Autonomous Exploration of Unknown Environment Using Regions Segmentation and VRP

Spacecraft Attitude Maneuver Planning Based on Deep Reinforcement Learning under Complex Constraints

Deep Reinforcement Learning Integrated RRT Algorithm for Path Planning

An Enhanced Hierarchical Planning Framework for Multi-Robot Autonomous Exploration

Generative Planning for Temporally Coordinated Exploration in Reinforcement Learning

An Incremental Optimization Approach to Address the Spatiotemporal Reward Coupling Effects in Deep Reinforcement Learning for Path Planning

Deep Reinforcement Learning-based Large-scale Robot Exploration

Hierarchical path planner for unknown space exploration using reinforcement learning-based intelligent frontier selection

Hybrid Bidirectional Rapidly Exploring Random Tree Path Planning Algorithm with Reinforcement Learning

Reinforcement Learning with Probabilistically Complete Exploration

Application research of RRT algorithm path planning based on reinforcement learning

Learning to Plan in High Dimensions via Neural Exploration-Exploitation Trees

Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward Prediction

Adaptive trajectory-constrained exploration strategy for deep reinforcement learning

Improved Robot Path Planning Method Based on Deep Reinforcement Learning

A deep reinforcement learning based method for real-time path planning and dynamic obstacle avoidance

A Mapless Local Path Planning Approach Using Deep Reinforcement Learning Framework

Flexible and Efficient Long-Range Planning Through Curious Exploration

Fast Path Planning for Long-Range Planetary Roving Based on a Hierarchical Framework and Deep Reinforcement Learning

Guiding Robot Exploration in Reinforcement Learning via Automated Planning