Random Network Distillation Based Deep Reinforcement Learning for AGV Path Planning

Huilin Yin,Shengkai Su,Yinjia Lin,Pengju Zhen,Karin Festl,Daniel Watzenig
2024-04-19
Abstract:With the flourishing development of intelligent warehousing systems, the technology of Automated Guided Vehicle (AGV) has experienced rapid growth. Within intelligent warehousing environments, AGV is required to safely and rapidly plan an optimal path in complex and dynamic environments. Most research has studied deep reinforcement learning to address this challenge. However, in the environments with sparse extrinsic rewards, these algorithms often converge slowly, learn inefficiently or fail to reach the target. Random Network Distillation (RND), as an exploration enhancement, can effectively improve the performance of proximal policy optimization, especially enhancing the additional intrinsic rewards of the AGV agent which is in sparse reward environments. Moreover, most of the current research continues to use 2D grid mazes as experimental environments. These environments have insufficient complexity and limited action sets. To solve this limitation, we present simulation environments of AGV path planning with continuous actions and positions for AGVs, so that it can be close to realistic physical scenarios. Based on our experiments and comprehensive analysis of the proposed method, the results demonstrate that our proposed method enables AGV to more rapidly complete path planning tasks with continuous actions in our environments. A video of part of our experiments can be found at
Robotics,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper aims to solve the path planning problem of Automated Guided Vehicles (AGVs) in complex dynamic environments. Current methods have low learning efficiency and slow convergence in sparse reward environments. The researchers propose a Deep Reinforcement Learning (DRL) method called RND-PPO based on Random Network Distillation (RND) to enhance the exploration performance of the Proximal Policy Optimization (PPO) algorithm, especially in reward-sparse environments. In most existing research, path planning experiments of AGVs are usually conducted in 2D grid mazes, which are insufficient to simulate the complexity and action set of real environments. Therefore, the paper constructs an AGV path planning simulation environment with continuous actions and positions, which is closer to real physical scenarios. RND-PPO improves the learning effectiveness of the agent (AGV) in sparse reward environments by adding intrinsic rewards, allowing the AGV to complete tasks with continuous actions faster. The experimental results show that the RND-PPO method can efficiently and stably complete path planning tasks in complex environments, especially in the presence of dynamic targets, compared to AGVs using only PPO. Through intrinsic rewards, RND-PPO promotes the exploration of the entire environment by the AGV, rather than just finding a single reward target, thus adapting to changes in the external environment. In conclusion, the paper proposes a novel AGV path planning method combining RND, which improves the efficiency of path planning in complex dynamic environments.