Abstract:With the development of unmanned aerial vehicle (UAV) and artificial intelligence (AI) technology, Intelligent UAV will be widely used in future autonomous aerial combat. Previous researches on autonomous aerial combat within visual range (WVR) have limitations due to simplifying assumptions, limited robustness, and ignoring sensor errors. In this paper, in order to consider the error of the aircraft sensors, we model the aerial combat WVR as a state-adversarial Markov decision process (SA-MDP), which introduce the small adversarial perturbations on state observations and these perturbations do not alter the environment directly, but can mislead the agent into making suboptimal decisions. Meanwhile, we propose a novel autonomous aerial combat maneuver strategy generation algorithm with high-performance and high-robustness based on state-adversarial deep deterministic policy gradient algorithm (SA-DDPG), which add a robustness regularizers related to an upper bound on performance loss at the actor-network. At the same time, a reward shaping method based on maximum entropy (MaxEnt) inverse reinforcement learning algorithm (IRL) is proposed to improve the aerial combat strategy generation algorithm’s efficiency. Finally, the efficiency of the aerial combat strategy generation algorithm and the performance and robustness of the resulting aerial combat strategy is verified by simulation experiments. Our main contributions are three-fold. First, to introduce the observation errors of UAV, we are modeling air combat as SA-MDP. Second, to make the strategy network of air combat maneuver more robust in the presence of observation errors, we introduce regularizers into the policy gradient. Third, to solve the problem that air combat’s reward function is too sparse, we use MaxEnt IRL to design a shaping reward to accelerate the convergence of SA-DDPG.

UCAV Air Combat Maneuver Decisions Based on a Proximal Policy Optimization Algorithm with Situation Reward Shaping

UCAV Autonomous Maneuvering Decision Based on Curriculum Learning Mechanism Training

Mean policy-based proximal policy optimization for maneuvering decision in multi-UAV air combat

Research on Autonomous Maneuvering Decision of UCAV based on Approximate Dynamic Programming

UAV maneuver decision-making via deep reinforcement learning for short-range air combat

Model-free Maneuvering Control of Fixed-Wing UAVs Based on Deep Reinforcement Learning

Autonomous Maneuver Decision of UCAV Air Combat Based on Double Deep Q Network Algorithm and Stochastic Game Theory

Research on UCAV Maneuvering Decision Method Based on Heuristic Reinforcement Learning

UAV Cooperative Air Combat Maneuvering Confrontation Based on Multi-agent Reinforcement Learning

Maneuver Decision of UAV in Short-Range Air Combat Based on Deep Reinforcement Learning

Air Combat Maneuver Decision Method Based on A3C Deep Reinforcement Learning

Maneuver Decision-Making Through Proximal Policy Optimization And Monte Carlo Tree Search

Multi-intent autonomous decision-making for air combat with deep reinforcement learning

Predictive air combat decision model with segmented reward allocation

UAV cooperative air combat maneuver decision based on multi-agent reinforcement learning

Autonomous maneuver decision-making for a UCAV in short-range aerial combat based on an MS-DDQN algorithm

Autonomous Maneuver Decision Making of Dual-UAV Cooperative Air Combat Based on Deep Reinforcement Learning

Strategy Generation Based on DDPG with Prioritized Experience Replay for UCAV.

Proximal Policy Optimization for Multi-rotor UAV Autonomous Guidance, Tracking and Obstacle Avoidance

Dynamic Control Allocation between Onboard and Delayed Remote Control for Unmanned Aircraft System Detect-and-Avoid

UAV Autonomous Aerial Combat Maneuver Strategy Generation with Observation Error Based on State-Adversarial Deep Deterministic Policy Gradient and Inverse Reinforcement Learning