Bidirectional Obstacle Avoidance Enhancement‐Deep Deterministic Policy Gradient: A Novel Algorithm for Mobile‐Robot Path Planning in Unknown Dynamic Environments

Junxiao Xue,Shiwen Zhang,Yafei Lu,Xiaoran Yan,Yuanxun Zheng
DOI: https://doi.org/10.1002/aisy.202300444
IF: 7.298
2024-02-08
Advanced Intelligent Systems
Abstract:A novel deep reinforcement learning‐based method called bidirectional obstacle avoidance enhancement‐deep deterministic policy gradient (BOAE‐DDPG) for mobile‐robot path planning in unknown dynamic environments is proposed. The core BOAE mechanism is inspired by dynamic psychology, making BOAE‐DDPG better at learning obstacle avoidance without relying on environmental information. In addition, new assisted reward factors designed for path planning promote learning and convergence. Real‐time path planning in unknown dynamic environments is a significant challenge for mobile robots. Many researchers have attempted to solve this problem by introducing deep reinforcement learning, which trains agents through interaction with their environments. A method called BOAE‐DDPG, which combines the novel bidirectional obstacle avoidance enhancement (BOAE) mechanism with the deep deterministic policy gradient (DDPG) algorithm, is proposed to enhance the learning ability of obstacle avoidance. Inspired by the analysis of the reaction advantage in dynamic psychology, the BOAE mechanism focuses on obstacle‐avoidance reactions from the state and action. The cross‐attention mechanism is incorporated to enhance the attention to valuable obstacle‐avoidance information. Meanwhile, the obstacle‐avoidance behavioral advantage is separately estimated using the modified dueling network. Based on the learning goals of the mobile robot, new assistive reward factors are incorporated into the reward function to promote learning and convergence. The proposed method is validated through several experiments conducted using the simulation platform Gazebo. The results show that the proposed method is suitable for path planning tasks in unknown environments and has an excellent obstacle‐avoidance learning capability.
What problem does this paper attempt to address?