Path Planning for Multi-UAV Based on Improved Proximal Policy Optimization Algorithm

Wenya Zhu,Wenxing Fang,Yanxu Su
DOI: https://doi.org/10.1109/yac63405.2024.10598516
2024-01-01
Abstract:This paper explores the application of reinforcement learning in multiple unmanned aerial vehicle (multi-UAV) path planning. The traditional proximal policy optimization (PPO) algorithm faces issues with low efficiency and unstable performance. We introduce a refined version of PPO called RB-PPO (Proximal Policy Optimization with Replay Buffer). The RB-PPO uses off-policy data stored in a replay buffer to enhance the sample efficiency of PPO. Furthermore, it incorporates rollback operations into the objective function to constrain the difference between new and old policies, making policy updates more stable. The RB-PPO combines the stability advantage of on-policy algorithms with the efficient sampling of off-policy algorithms. The experiment results indicate that the RB-PPO achieves quicker convergence and better training rewards compared to the PPO.
What problem does this paper attempt to address?