Abstract:With the development of unmanned aerial vehicle (UAV) and artificial intelligence (AI) technology, Intelligent UAV will be widely used in future autonomous aerial combat. Previous researches on autonomous aerial combat within visual range (WVR) have limitations due to simplifying assumptions, limited robustness, and ignoring sensor errors. In this paper, in order to consider the error of the aircraft sensors, we model the aerial combat WVR as a state-adversarial Markov decision process (SA-MDP), which introduce the small adversarial perturbations on state observations and these perturbations do not alter the environment directly, but can mislead the agent into making suboptimal decisions. Meanwhile, we propose a novel autonomous aerial combat maneuver strategy generation algorithm with high-performance and high-robustness based on state-adversarial deep deterministic policy gradient algorithm (SA-DDPG), which add a robustness regularizers related to an upper bound on performance loss at the actor-network. At the same time, a reward shaping method based on maximum entropy (MaxEnt) inverse reinforcement learning algorithm (IRL) is proposed to improve the aerial combat strategy generation algorithm’s efficiency. Finally, the efficiency of the aerial combat strategy generation algorithm and the performance and robustness of the resulting aerial combat strategy is verified by simulation experiments. Our main contributions are three-fold. First, to introduce the observation errors of UAV, we are modeling air combat as SA-MDP. Second, to make the strategy network of air combat maneuver more robust in the presence of observation errors, we introduce regularizers into the policy gradient. Third, to solve the problem that air combat’s reward function is too sparse, we use MaxEnt IRL to design a shaping reward to accelerate the convergence of SA-DDPG.

Two-Stage Strategy to Achieve a Reinforcement Learning-Based Upset Recovery Policy for Aircraft

Deep reinforcement learning-based upset recovery control for generic transport aircraft

Aircraft Upset Recovery Strategy and Pilot Assistance System Based on Reinforcement Learning

Model-free Maneuvering Control of Fixed-Wing UAVs Based on Deep Reinforcement Learning

Deep Reinforcement Learning Automatic Landing Control of Fixed-Wing Aircraft Using Deep Deterministic Policy Gradient

Precision Landing of Autonomous Parafoil System Via Deep Reinforcement Learning

Train Trajectory Optimization with High-Risk State Space Boundaries: A Safe Reinforcement Learning Approach

UAV Autonomous Aerial Combat Maneuver Strategy Generation with Observation Error Based on State-Adversarial Deep Deterministic Policy Gradient and Inverse Reinforcement Learning

Deep reinforcement learning with symmetric data augmentation applied for aircraft lateral attitude tracking control

Trajectory Planning for Airborne Radar in Extended Target Tracking Based on Deep Reinforcement Learning

A Policy-Reuse Algorithm Based on Destination Position Prediction for Aircraft Guidance Using Deep Reinforcement Learning

Memory-Enhanced Twin Delayed Deep Deterministic Policy Gradient (ME-TD3)-Based Unmanned Combat Aerial Vehicle Trajectory Planning for Avoiding Radar Detection Threats in Dynamic and Unknown Environments

Reward Function Optimization of a Deep Reinforcement Learning Collision Avoidance System

Model Predictive Control Based Washout Algorithm Design for Flight Simulator Upset Prevention and Recovery Training

Trajectory Tracking Control of Variable Sweep Aircraft Based on Reinforcement Learning

Robust Control Strategy for Quadrotor Drone Using Reference Model-Based Deep Deterministic Policy Gradient

Autonomous Obstacle Avoidance and Target Tracking of UAV Based on Deep Reinforcement Learning

Deep reinforcement learning for aircraft longitudinal control augmentation system

Target tracking strategy using deep deterministic policy gradient

Path Planning of Unmanned Aerial Vehicle in Complex Environments Based on State-Detection Twin Delayed Deep Deterministic Policy Gradient

Tube-based robust reinforcement learning for autonomous maneuver decision for UCAVs