Proximal Policy Optimization for Multi-rotor UAV Autonomous Guidance, Tracking and Obstacle Avoidance

Hu Duoxiu,Dong Wenhan,Xie Wujie,He Lei
DOI: https://doi.org/10.1007/s42405-021-00427-2
IF: 1.233
2022-01-25
International Journal of Aeronautical and Space Sciences
Abstract:A Markov decision process model with two stages of long-distance autonomous guidance and short-distance autonomous tracking of obstacle avoidance was developed in this study, aiming to address the performance problem of multi-rotor unmanned aerial vehicles (UAV) to ground dynamic target. On this basis, an improved proximal policy optimization (PPO) algorithm is proposed. The proposed algorithm uses long short-term memory (LSTM) network to calculate reward values, update network parameters and perform adaptive optimization iterations through status information, such as the real-time position relationship between the UAV and the target, taking into account the time-sequential data received from the UAV and the environmental context information. Finally, experiment with simulation testing was performed on platform based robot control system species. The results showed that the method proposed in this paper is able to safely and effectively realize autonomous maneuvering during the entire process of the reconnaissance mission. Compared with the traditional PPO algorithm, the introduction of LSTM neural network shortened the model training time, considerably improved the efficiency of tracking and avoiding obstacles, as well as further strengthened the robustness, accuracy, and real-time ability of the algorithm.
What problem does this paper attempt to address?