UAV Path Planning Based on Multicritic-Delayed Deep Deterministic Policy Gradient

Runjia Wu,Fangqing Gu,Hai-lin Liu,Hongjian Shi
DOI: https://doi.org/10.1155/2022/9017079
2022-03-14
Wireless Communications and Mobile Computing
Abstract:Deep deterministic policy gradient (DDPG) algorithm is a reinforcement learning method, which has been widely used in UAV path planning. However, the critic network of DDPG is frequently updated in the training process. It leads to an inevitable overestimation problem and increases the training computational complexity. Therefore, this paper presents a multicritic-delayed DDPG method for solving the UAV path planning. It uses multicritic networks and delayed learning methods to reduce the overestimation problem of DDPG and adds noise to improve the robustness in the real environment. Moreover, a UAV mission platform is built to train and evaluate the effectiveness and robustness of the proposed method. Simulation results show that the proposed algorithm has a higher convergence speed, a better convergence effect, and stability. It indicates that UAV can learn more knowledge from the complex environment.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?