Abstract:With the rapid development of intelligent and informationized air battlefields,intelligent air combat has increasingly become key to affecting the outcome of a battlefield. In conventional multi-aircraft air combat,there are issues of low efficiency in intelligent decision-making,difficulty in meeting the needs of complex air combat environments,and unreasonable target allocation. In response to the problems in conventional multi-aircraft air combat,we introduce a long short-term memory-proximal policy optimization algorithm (LSTM-PPO). Using the long short-term memory network to extract features and perceive the situation of the state,an intelligent agent trains the normalized and feature-fused state information residual network and value network,chooses the optimal action through the proximal policy optimization strategy based on the current situation,and embeds a reward function containing expert knowledge during the training process to solve the problem of sparse rewards. Meanwhile,a target allocation algorithm based on threat value calculation is presented. Using angle,speed,and height threat values as the basis for target allocation,the ID of the target aircraft with the highest threat value on the battlefield is calculated in real-time. When the strategy network outputs an action of attack,it conducts target allocation. To confirm the effectiveness of the algorithm,we carried out 4v4 multi-aircraft air combat experiments in a digital twin simulation environment built by our research group. The red team consists of reinforcement learning agents based on LSTM-PPO algorithm,whereas the blue team comprises a finite state machine composed of expert knowledge bases. After more than 1200 rounds of aerial confrontation,the algorithm has been converged,and the win rate of the red team has reached 82％. Furthermore,we assessed the performance of four other mainstream reinforcement learning algorithms in 4v4 air combat experiments under the same experimental conditions. It is shown that the deep Q-network (DQN) and soft actor-critic (SAC) algorithms have difficulties in dealing with high-dimensional continuous action spaces and multiagent collaboration. The multi-agent deep deterministic policy gradient algorithm (MADDPG) employs a multi-agent strategy and cooperative training,so it exhibits a significantly higher win rate than the DQN and SAC algorithms. The multi-agent proximal policy optimization (MAPPO) algorithm has a relatively high failure rate and is not stable enough to deal with enemy aircraft's strategies in some cases. The LSTM-PPO algorithm shows a significantly higher win rate than other mainstream reinforcement learning algorithms in multi-aircraft collaborative air combat,which confirms the effectiveness of the LSTM-PPO algorithm in dealing with high-dimensional continuous action spaces and multi-aircraft collaborative operations.

Enhanced LSTM‐DQN algorithm for a two‐player zero‐sum game in three‐dimensional space

Cooperative multi-agent target searching: a deep reinforcement learning approach based on parallel hindsight experience replay

Research on Autonomous Manoeuvre Decision Making in Within-Visual-Range Aerial Two-Player Zero-Sum Games Based on Deep Reinforcement Learning

Intelligent Decision Making and Target Assignment of Multi-Aircraft Air Combat Based on the LSTM-PPO Algorithm

MathDQN: Solving Arithmetic Word Problems Via Deep Reinforcement Learning.

Reinforcement Learning In Two Player Zero Sum Simultaneous Action Games

M$^2$DQN: A Robust Method for Accelerating Deep Q-learning Network

FM3Q: Factorized Multi-Agent MiniMax Q-Learning for Two-Team Zero-Sum Markov Game

DouZero+: Improving DouDizhu AI by Opponent Modeling and Coach-guided Learning

RARSMSDou: Master the Game of DouDiZhu With Deep Reinforcement Learning Algorithms

Enhanced Rolling Horizon Evolution Algorithm with Opponent Model Learning: Results for the Fighting Game AI Competition

DouRN: Improving DouZero by Residual Neural Networks

Air Combat Maneuver Decision Based on Deep Reinforcement Learning and Game Theory

Air-Combat Strategy Using Deep Q-Learning

Research on Action Strategies and Simulations of DRL and MCTS-based Intelligent Round Game

Research and Implementation of Intelligent Decision Based on a Priori Knowledge and DQN Algorithms in Wargame Environment

AlphaDou: High-Performance End-to-End Doudizhu AI Integrating Bidding

Combining Tree Search and Action Prediction for State-of-the-Art Performance in DouDiZhu

A Multi-Step Minimax Q-learning Algorithm for Two-Player Zero-Sum Markov Games

Pursuit and Evasion Strategy of a Differential Game Based on Deep Reinforcement Learning

A Deep Reinforcement Learning-Based Approach in Porker Game