Abstract:The application of reinforcement learning (RL) to the field of autonomous robotics has high requirements about sample efficiency, since the agent expends for interaction with the environment. One method for sample efficiency is to extract knowledge from existing samples and used to exploration. Typical RL algorithms achieve exploration using task-specific knowledge or adding exploration noise. These methods are limited to current policy improvement level and lack of long-term planning. We propose a novel active exploration deep RL algorithm for the continuous action space problem named active exploration deep reinforcement learning (AEDRL). Our method uses the Gaussian process to model dynamic model, enabling the probability description of prediction sample. Action selection is formulated as the solution of the optimization problem. Thus, the optimization objective is specifically designed for selecting samples that can minimize the uncertainty of the dynamic model. Active exploration is achieved through long-term optimized action selection. This long-term considered action exploration method is more guidance for learning. Enable intelligent agents to explore more interesting action spaces. The proposed AEDRL algorithm is evaluated on several robotic control task including classic pendulum problem and five complex articulated robots. The AEDRL can learn a controller using fewer episodes and demonstrates performance and sample efficiency.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to improve the sample efficiency of Reinforcement Learning (RL) in continuous action spaces. Specifically, the author proposes a new Active Exploration Deep Reinforcement Learning (AEDRL) algorithm, aiming to achieve more efficient exploration by optimizing the dynamic model, thereby reducing the number of interactions with the environment and improving the learning efficiency of the agent. ### Problem Background Traditional reinforcement learning algorithms face the problem of low sample efficiency in robot control tasks, that is, a large number of environmental interactions are required to learn effective strategies. In order to improve sample efficiency, existing methods usually rely on task - specific knowledge or add exploration noise, but these methods lack long - term planning ability and cannot effectively balance exploration and exploitation. ### Solution The AEDRL algorithm solves the above problems in the following ways: 1. **Introducing Gaussian Process (GP)**: Using GP to model the dynamic system can describe the uncertainty of predicted samples in a probabilistic way. 2. **Optimizing the objective design**: Formulating action selection as an optimization problem, and the optimization objective is to select samples that can minimize the uncertainty of the dynamic model. This makes the exploration process more instructive and can better balance exploration and exploitation. 3. **Long - term optimized action selection**: By optimizing the action selection of long - term prediction, more effective active exploration is achieved. 4. **Combining the Actor - Critic framework**: AEDRL is based on the Actor - Critic framework, and updates the policy parameters by maximizing the information entropy, enabling the agent to learn a better control strategy within a fewer number of interactions. ### Experimental Verification The paper verifies the effectiveness of AEDRL through multiple robot control tasks (such as the classic pendulum problem and five complex joint robots). The experimental results show that AEDRL can achieve better performance in fewer training rounds, demonstrating its advantages in sample efficiency and performance. ### Main Contributions - Proposing a new AEDRL algorithm specifically for dealing with continuous action space problems. - Introducing a GP - based active exploration module, which improves the intelligence level of exploration. - By maximizing the information entropy, efficient optimization of the dynamic model is achieved. - Verifying the effectiveness of the algorithm in multiple robot control tasks. In general, this paper significantly improves the sample efficiency of reinforcement learning in continuous action spaces by improving the exploration mechanism, providing a more efficient solution for complex tasks such as robot control.

Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward Prediction

Exploration-efficient Deep Reinforcement Learning with Demonstration Guidance for Robot Control

Deep Reinforcement Learning for Autonomous Ground Vehicle Exploration Without A-Priori Maps

Active exploration in parameterized reinforcement learning

Deep Reinforcement Learning-based Large-scale Robot Exploration

Autonomous Navigation of Unmanned Vehicle Through Deep Reinforcement Learning

Reinforcement Learning for Robot Navigation with Adaptive Forward Simulation Time (AFST) in a Semi-Markov Model

Sim-to-Real Transfer with Action Mapping and State Prediction for Robot Motion Control

Reinforcement Learning for Robot Navigation with Adaptive ExecutionDuration (AED) in a Semi-Markov Model

Adaptive trajectory-constrained exploration strategy for deep reinforcement learning

Exploration Without Maps via Zero-Shot Out-of-Distribution Deep Reinforcement Learning

Human-in-the-Loop Reinforcement Learning in Continuous-Action Space

ACDER: Augmented Curiosity-Driven Experience Replay

Navigation in Unknown Dynamic Environments Based on Deep Reinforcement Learning

Active Inference and Reinforcement Learning: A unified inference on continuous state and action spaces under partial observability

An End-to-End Path Planner Combining Potential Field Method With Deep Reinforcement Learning

Active Exploration in Bayesian Model-based Reinforcement Learning for Robot Manipulation

Goal-Driven Autonomous Exploration Through Deep Reinforcement Learning

Demonstration Guided Actor-Critic Deep Reinforcement Learning for Fast Teaching of Robots in Dynamic Environments

Model-Based Reinforcement Learning for Robotic Arm Control with Limited Environment Interaction

Autonomous Exploration Under Uncertainty via Deep Reinforcement Learning on Graphs