Abstract:The reconnaissance of high-value targets is prerequisite for effective operations. The recent appreciation of deep reinforcement learning (DRL) arises from its success in navigation problems, but due to the competitiveness and complexity of the military field, the applications of DRL in the military field are still unsatisfactory. In this paper, an end-to-end DRL-based intelligent reconnaissance mission planning is proposed for dual unmanned aerial vehicle (dual UAV) cooperative reconnaissance missions under high-threat and dense situations. Comprehensive consideration is given to specific mission properties and parameter requirements through the whole modelling. Firstly, the reconnaissance mission is described as a Markov decision process (MDP), and the mission planning model based on DRL is established. Secondly, the environment and UAV motion parameters are standardized to input the neural network, aiming to deduce the difficulty of algorithm convergence. According to the concrete requirements of non-reconnaissance by radars, dual-UAV cooperation and wandering reconnaissance in the mission, four reward functions with weights are designed to enhance agent understanding to the mission. To avoid sparse reward, the clip function is used to control the reward value range. Finally, considering the continuous action space of reconnaissance mission planning, the widely applicable proximal policy optimization (PPO) algorithm is used in this paper. The simulation is carried out by combining offline training and online planning. By changing the location and number of ground detection areas, from 1 to 4, the model with PPO can maintain 20% of reconnaissance proportion and a 90% mission complete rate and help the reconnaissance UAV to complete efficient path planning. It can adapt to unknown continuous high-dimensional environmental changes, is generalizable, and reflects strong intelligent planning performance.

A deep reinforcement learning approach for multi-agent mobile robot patrolling

Learning to Cooperate: Application of Deep Reinforcement Learning for Online AGV Path Finding.

Mapless Collaborative Navigation for a Multi-Robot System Based on the Deep Reinforcement Learning

An Energy-aware and Fault-tolerant Deep Reinforcement Learning based approach for Multi-agent Patrolling Problems

Balancing Efficiency and Unpredictability in Multi-robot Patrolling: A MARL-Based Approach.

Autonomous Vehicle Patrolling Through Deep Reinforcement Learning: Learning to Communicate and Cooperate

Cooperative multi-agent target searching: a deep reinforcement learning approach based on parallel hindsight experience replay

A Deep Reinforcement Learning-Based Method Applied for Solving Multi-Agent Defense and Attack Problems.

Decentralized Multi-Agent Reinforcement Learning with Global State Prediction

Multi-Agent Deep Reinforcement Learning For Persistent Monitoring With Sensing, Communication, and Localization Constraints

Multi-Robot Informative Path Planning for Efficient Target Mapping using Deep Reinforcement Learning

Efficient Domain Coverage for Vehicles with Second-Order Dynamics via Multi-Agent Reinforcement Learning

Efficient Multi-agent Navigation with Lightweight DRL Policy

Collaborative multi-agents in dynamic industrial internet of things using deep reinforcement learning

Deep Reinforcement Learning for Intelligent Dual-UAV Reconnaissance Mission Planning

Multi-agent deep reinforcement learning with centralized training and decentralized execution for transportation infrastructure management

Maximizing UAV Coverage in Maritime Wireless Networks: A Multiagent Reinforcement Learning Approach

Multi-Agent Path Planning Method Based on Improved Deep Q-Network in Dynamic Environments

The Design and Realization of Multi-agent Obstacle Avoidance based on Reinforcement Learning

Addressing unpredictable movements of dynamic obstacles with deep reinforcement learning to ensure safe navigation for omni-wheeled mobile robot

Path Planning of Autonomous Mobile Robot in Comprehensive Unknown Environment Using Deep Reinforcement Learning