An Off-COMA Algorithm for Multi-UCAV Intelligent Combat Decision-Making

Zhengkang Shi,Jingcheng Wang,Hongyuan Wang
DOI: https://doi.org/10.1109/docs55193.2022.9967776
2022-01-01
Abstract:Unmanned Combat Aerial Vehicle (UCAV) has played an important role in modern military warfare, whose level of intelligent decision-making needs to be improved urgently. In this paper, a simplified multi-UCAV combat environment is established, which is modeled as a multi-agent Markov games. There are many difficulties in multi-UCAV combat problem, including strong randomness and complexity, sparse rewards, and no strong opponents for training. In order to solve the above problems, an algorithm called Off Conterfactual Multi-Agent (Off-COMA) is proposed. This algorithm extends the COMA algorithm to the off-policy version, and can reuse historical data for training, which improves data utilization. In addition, the proposed Off-COMA algorithm exploits an improved prioritized experience replay method to deal with the sparse reward. This paper presents an asymmetric policy replay self-play method, which provides a guarantee for the algorithm to generate a powerful policy. Finally, compared with several classical multi-agent reinforcement learning algorithms, the superiority of Off-COMA algorithm in solving the multi-UCAV combat problem is verified.
What problem does this paper attempt to address?