Oracle-Guided Deep Reinforcement Learning for Large-Scale Multi-UAVs Flocking and Navigation.

Wen Wang,Liang Wang,Junfeng Wu,Xianping Tao,Haijun Wu
DOI: https://doi.org/10.1109/tvt.2022.3184043
IF: 6.8
2022-01-01
IEEE Transactions on Vehicular Technology
Abstract:The flocking and navigation control of large-scale Unmanned Aerial Vehicle (UAV) swarms have received a lot of research interest due to the wide applications of UAVs in many fields. Compared to traditional non-learning-based flocking and navigation control methods, reinforcement learning-based methods have advantages in model-free, flexibility, and adaptability. In this paper, we formulate the flocking and navigation control of the UAV swarm as a Markov Decision Process (MDP) and use multi-agent reinforcement learning methods to solve the problem. There are two significant challenges introduced by reinforcement learning: the scalability issue and the partial observations of each UAV. We adopt the independent learning and parameter sharing scheme to tackle the scalability issue, which extends the single-agent reinforcement learning algorithms to the multi-agent scenario. For the partial observations, we propose an oracle-guided two-stage training and execution scheme, which utilizes the flock center during the training phase but avoids the dependence on the flock center during the execution phase. We design the oracle-guided observations and rewards and build a highly efficient simulation environment to conduct experiments. Simulation results show that the policy trained with our method performs well with up to thirty-two UAVs and outperforms the policy trained with local observations.
What problem does this paper attempt to address?