Multi-UAV Path Planning and Following Based on Multi-Agent Reinforcement Learning

Xiaoru Zhao,Rennong Yang,Liangsheng Zhong,Zhiwei Hou
DOI: https://doi.org/10.3390/drones8010018
IF: 5.532
2024-01-11
Drones
Abstract:Dedicated to meeting the growing demand for multi-agent collaboration in complex scenarios, this paper introduces a parameter-sharing off-policy multi-agent path planning and the following approach. Current multi-agent path planning predominantly relies on grid-based maps, whereas our proposed approach utilizes laser scan data as input, providing a closer simulation of real-world applications. In this approach, the unmanned aerial vehicle (UAV) uses the soft actor–critic (SAC) algorithm as a planner and trains its policy to converge. This policy enables end-to-end processing of laser scan data, guiding the UAV to avoid obstacles and reach the goal. At the same time, the planner incorporates paths generated by a sampling-based method as following points. The following points are continuously updated as the UAV progresses. Multi-UAV path planning tasks are facilitated, and policy convergence is accelerated through sharing experiences among agents. To address the challenge of UAVs that are initially stationary and overly cautious near the goal, a reward function is designed to encourage UAV movement. Additionally, a multi-UAV simulation environment is established to simulate real-world UAV scenarios to support training and validation of the proposed approach. The simulation results highlight the effectiveness of the presented approach in both the training process and task performance. The presented algorithm achieves an 80% success rate to guarantee that three UAVs reach the goal points.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve multi - unmanned aerial vehicle (Multi - UAV) path planning and following in complex scenarios. Specifically, currently, multi - UAV path planning mainly depends on grid - based maps, while the method proposed in this paper uses laser - scanning data as input to simulate real - world applications in a more practical way. The paper introduces a parameter - sharing off - policy multi - agent path planning and following method, which utilizes the Soft Actor - Critic (SAC) algorithm for planning and makes its policy converge through training. This method can process laser - scanning data end - to - end, guiding UAVs to avoid obstacles and reach the target. Meanwhile, the planner also combines the paths generated by sampling as following points, and these following points are continuously updated as the UAVs move forward. In addition, by sharing experiences among agents, the convergence of policies in multi - UAV path planning tasks is accelerated. To address the problems of UAVs being initially stationary and being overly cautious when approaching the target, the paper designs a reward function to encourage UAVs to move. Moreover, a multi - UAV simulation environment is established to simulate real - world UAV scenarios and support the training and verification of the proposed method. Experimental results show that the proposed algorithm is effective in terms of both the training process and task performance, with a success rate of 80%, ensuring that three UAVs can reach the target points.