An Improved Algorithm of Robot Path Planning in Complex Environment Based on Double DQN

Fei Zhang,Chaochen Gu,Feng Yang
DOI: https://doi.org/10.48550/arXiv.2107.11245
2021-07-23
Abstract:Deep Q Network (DQN) has several limitations when applied in planning a path in environment with a number of dilemmas according to our experiment. The reward function may be hard to model, and successful experience transitions are difficult to find in experience replay. In this context, this paper proposes an improved Double DQN (DDQN) to solve the problem by reference to A* and Rapidly-Exploring Random Tree (RRT). In order to achieve the rich experiments in experience replay, the initialization of robot in each training round is redefined based on RRT strategy. In addition, reward for the free positions is specially designed to accelerate the learning process according to the definition of position cost in A*. The simulation experimental results validate the efficiency of the improved DDQN, and robot could successfully learn the ability of obstacle avoidance and optimal path planning in which DQN or DDQN has no effect.
Robotics,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges encountered in robot path planning based on Double DQN (Double Deep Q - Network) in complex environments. Specifically, the paper points out that there are two main problems when using the traditional DDQN algorithm for path planning: 1. **Problem of experience initialization**: After each training session, the initial position of the robot is always reset to the given starting point. This causes the robot to be unable to learn new experiences from diverse environments. Especially when there are obstacles around the starting point, the robot is likely to get trapped and difficult to escape. 2. **Problem of reward function design**: For each position in the free space, the reward value is the same. This design makes it difficult for the model to converge because the robot cannot effectively learn how to select the optimal path through the reward mechanism. To solve the above problems, the paper proposes an improved DDQN algorithm. The main improvements include: - **Change of initialization strategy**: Drawing on the idea of Rapidly - Exploring Random Tree (RRT), the initialization after each training session is no longer fixed at the starting point, but a free position is randomly selected as the new starting point with a certain probability. This can increase the diversity in experience replay and help the robot learn more strategies. - **Design of reward function**: Referring to the definition of path cost in the A* algorithm, a new reward function is designed. The new reward function takes into account the change in the distance between the current position and the end point as well as the change in the distance between the current position and the starting point, thus accelerating the convergence of the model. Through these improvements, the paper verifies the path - planning ability of the improved DDQN algorithm in complex environments, which can effectively avoid obstacles and find the optimal path. The experimental results show that the improved DDQN is significantly superior to the traditional DDQN algorithm in both training efficiency and path - planning performance.