Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward Prediction

Dongfang Zhao,Xu Huanshi,Zhang Xun
DOI: https://doi.org/10.1007/s44196-023-00389-1
IF: 2.259
2024-01-10
International Journal of Computational Intelligence Systems
Abstract:The application of reinforcement learning (RL) to the field of autonomous robotics has high requirements about sample efficiency, since the agent expends for interaction with the environment. One method for sample efficiency is to extract knowledge from existing samples and used to exploration. Typical RL algorithms achieve exploration using task-specific knowledge or adding exploration noise. These methods are limited to current policy improvement level and lack of long-term planning. We propose a novel active exploration deep RL algorithm for the continuous action space problem named active exploration deep reinforcement learning (AEDRL). Our method uses the Gaussian process to model dynamic model, enabling the probability description of prediction sample. Action selection is formulated as the solution of the optimization problem. Thus, the optimization objective is specifically designed for selecting samples that can minimize the uncertainty of the dynamic model. Active exploration is achieved through long-term optimized action selection. This long-term considered action exploration method is more guidance for learning. Enable intelligent agents to explore more interesting action spaces. The proposed AEDRL algorithm is evaluated on several robotic control task including classic pendulum problem and five complex articulated robots. The AEDRL can learn a controller using fewer episodes and demonstrates performance and sample efficiency.
computer science, artificial intelligence, interdisciplinary applications
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the sample efficiency of Reinforcement Learning (RL) in continuous action spaces. Specifically, the author proposes a new Active Exploration Deep Reinforcement Learning (AEDRL) algorithm, aiming to achieve more efficient exploration by optimizing the dynamic model, thereby reducing the number of interactions with the environment and improving the learning efficiency of the agent. ### Problem Background Traditional reinforcement learning algorithms face the problem of low sample efficiency in robot control tasks, that is, a large number of environmental interactions are required to learn effective strategies. In order to improve sample efficiency, existing methods usually rely on task - specific knowledge or add exploration noise, but these methods lack long - term planning ability and cannot effectively balance exploration and exploitation. ### Solution The AEDRL algorithm solves the above problems in the following ways: 1. **Introducing Gaussian Process (GP)**: Using GP to model the dynamic system can describe the uncertainty of predicted samples in a probabilistic way. 2. **Optimizing the objective design**: Formulating action selection as an optimization problem, and the optimization objective is to select samples that can minimize the uncertainty of the dynamic model. This makes the exploration process more instructive and can better balance exploration and exploitation. 3. **Long - term optimized action selection**: By optimizing the action selection of long - term prediction, more effective active exploration is achieved. 4. **Combining the Actor - Critic framework**: AEDRL is based on the Actor - Critic framework, and updates the policy parameters by maximizing the information entropy, enabling the agent to learn a better control strategy within a fewer number of interactions. ### Experimental Verification The paper verifies the effectiveness of AEDRL through multiple robot control tasks (such as the classic pendulum problem and five complex joint robots). The experimental results show that AEDRL can achieve better performance in fewer training rounds, demonstrating its advantages in sample efficiency and performance. ### Main Contributions - Proposing a new AEDRL algorithm specifically for dealing with continuous action space problems. - Introducing a GP - based active exploration module, which improves the intelligence level of exploration. - By maximizing the information entropy, efficient optimization of the dynamic model is achieved. - Verifying the effectiveness of the algorithm in multiple robot control tasks. In general, this paper significantly improves the sample efficiency of reinforcement learning in continuous action spaces by improving the exploration mechanism, providing a more efficient solution for complex tasks such as robot control.