Mean policy-based proximal policy optimization for maneuvering decision in multi-UAV air combat
Yifan Zheng,Bin Xin,Bin He,Yulong Ding
DOI: https://doi.org/10.1007/s00521-024-10261-8
2024-08-09
Neural Computing and Applications
Abstract:Autonomous maneuvering decision-making is a crucial technology for Unmanned Aerial Vehicles (UAVs) to take the air domination in modern unmanned warfare. With the advantage of balancing exploration and exploitation, as well as the immediacy of end-to-end output by combining with deep neural network, multi-agent reinforcement learning (MARL) has made remarkable achievements in multi-UAV autonomous air combat maneuvering decision-making (MUAAMD). However, the implementation of effective cooperative policy learning remains a challenging issue for MARL methods with centralized training decentralized execution (CTDE) paradigm. This paper proposes a MARL-based method to improve the performance of cooperation in MUAAMD. Firstly, considering the constraints of dynamic and limited perception for UAVs in the realistic air combat scenario, the MUAAMD problem is formulated based on partially observable Markov game (POMG) model. Secondly, a novel efficient MARL algorithm named the mean policy-based proximal policy optimization (MP3O) is introduced. Specifically, a joint policy optimization mechanism is constructed by estimating the policies of neighboring agents in group as a mean-field approximation while training, which enables both centralized evaluation and improvement of cooperative policy under the CTDE paradigm. Thirdly, by combining with three improvement techniques, a cooperative decision-making framework for MUAAMD based on MP3O is proposed. Empirically, results of simulations and comparative experiments validate the effectiveness of proposed method in promoting cooperative policy learning in resolving MUAAMD problem.
computer science, artificial intelligence