Abstract:Autonomously exploring and mapping unknown indoor scenes is a prerequisite for many robotic tasks. Deep reinforcement learning-based methods can teach a robot to leverage structural regularities of indoor scenes via interaction with the environment. Therefore, an efficient and robust exploration strategy is assured. The existing approaches either directly control the robot's motions, which leads to long decision sequences and necessitates an enormous number of training samples, or indirectly specify a long-term goal but do not guarantee the goal's reachability, which hinders the training as well. To overcome these problems, we propose a novel autonomous scene exploration method, using an experience enhancement algorithm to accelerate policy training and generate highly efficient exploratory targets. We first incorporate an off-policy reinforcement learning algorithm and the experience replay buffer mechanism. Then, a global exploration policy is used to specify a long-term goal. Next, we take advantage of an incremental heuristic pathfinding algorithm to plan a collision-avoidance path to the long-term goal. When the robot follows the path, we divide it into several segments and evaluate the subpath rewards, which help further improve existing experiences. Finally, the experiences are filtered based on the temporal difference error and stored in the experience replay buffer. The proposed method can extract correct behaviors from past failures and designate accessible long-term goals. Through various experiments, we validated that our approach can effectively improve the training efficiency of the global exploration policy and the final performance of the overall system.

Regioned Episodic Reinforcement Learning

Episodic Reinforcement Learning with Expanded State-reward Space

Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning

Sample Efficient Reinforcement Learning Method Via High Efficient Episodic Memory.

Episodic Reinforcement Learning with Associative Memory.

Efficient Exploration in Resource-Restricted Reinforcement Learning

DEIR: Efficient and Robust Exploration through Discriminative-Model-Based Episodic Intrinsic Rewards

Neural Episodic Control with State Abstraction

Efficient Diversity-based Experience Replay for Deep Reinforcement Learning

Explore with Dynamic Map: Graph Structured Reinforcement Learning

Learning Long-Term Reward Redistribution via Randomized Return Decomposition

Clustered Reinforcement Learning

Deep Reinforcement Learning with Parametric Episodic Memory

Episodic Memory Deep Q-Networks

EMExplorer: an Episodic Memory Enhanced Autonomous Exploration Strategy with Voronoi Domain Conversion and Invalid Action Masking

SEREN: Knowing When to Explore and When to Exploit

Two-Stage Evolutionary Reinforcement Learning for Enhancing Exploration and Exploitation

Reward Uncertainty for Exploration in Preference-based Reinforcement Learning

Exploration-efficient Deep Reinforcement Learning with Demonstration Guidance for Robot Control

Autonomous Scene Exploration Using Experience Enhancement

Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration