Abstract:Abstract Multi-agent multi-target search strategies can be utilized in complex scenarios such as post-disaster search and rescue by unmanned aerial vehicles. To solve the problem of fixed target and trajectory, the current multi-agent multi-target search strategies are mainly based on deep reinforcement learning (DRL). However, the training of agents by the DRL tend to be brittle due to their sensitivity to the training environment, which makes the strategies learned by the agents fall into local optima frequently, resulting in poor system robustness. Additionally, sparse rewards in DRL will lead to the problems such as difficulty in system convergence and low utilization efficiency of the sampled data. To address the problem that the robustness of the agents is weakened and the sparse rewards exist in the multi-objective search environment, we propose a MiniMax Multi-agent Deep Deterministic Policy Gradient based on the Parallel Hindsight Experience Replay (PHER-M3DDPG) algorithm, which adopts the framework of centralized training and decentralized execution in continuous action space. To enhance the system robustness, the PHER-M3DDPG algorithm employs a minimax learning architecture, which adaptively adjusts the learning strategy of agents by involving adversarial disturbances. In addition, to solve the sparse rewards problem, the PHER-M3DDPG algorithm adopts a parallel hindsight experience replay mechanism to increase the efficiency of data utilization by involving virtual learning targets and batch processing of the sampled data. Simulation results show that the PHER-M3DDPG algorithm outperforms the existing algorithms in terms of convergence speed and the task completion time in a multi-target search environment.

Autonomous Obstacle Avoidance and Target Tracking of UAV Based on Deep Reinforcement Learning

UAV Maneuvering Target Tracking in Uncertain Environments Based on Deep Reinforcement Learning and Meta-Learning

Path Planning for UAV Ground Target Tracking via Deep Reinforcement Learning

Maneuvering target tracking of UAV based on MN-DDPG and transfer learning

Path Planning of Unmanned Aerial Vehicle in Complex Environments Based on State-Detection Twin Delayed Deep Deterministic Policy Gradient

Autonomous obstacle avoidance of UAV based on deep reinforcement learning

End-to-end UAV Intelligent Training via Deep Reinforcement Learning

Multi-UAV simultaneous target assignment and path planning based on deep reinforcement learning in dynamic multiple obstacles environments

Multi-UAV Autonomous Obstacle Avoidance Based on Reinforcement Learning

Target tracking strategy using deep deterministic policy gradient

Cooperative multi-agent target searching: a deep reinforcement learning approach based on parallel hindsight experience replay

Application of Deep Reinforcement Learning in UAVs: A Review

Autonomous UAV Navigation with Adaptive Control Based on Deep Reinforcement Learning

Deep reinforcement learning and its application in autonomous fitting optimization for attack areas of UCAVs

Memory-Enhanced Twin Delayed Deep Deterministic Policy Gradient (ME-TD3)-Based Unmanned Combat Aerial Vehicle Trajectory Planning for Avoiding Radar Detection Threats in Dynamic and Unknown Environments

UAV Obstacle Avoidance by Human-in-the-Loop Reinforcement in Arbitrary 3D Environment

A Reinforcement Learning-based Decentralized Method of Avoiding Multi-UAV Collision in 3-D Airspace

Obstacle Avoidance for UAS in Continuous Action Space Using Deep Reinforcement Learning

A Vision Based Deep Reinforcement Learning Algorithm for UAV Obstacle Avoidance

Autonomous Navigation of UAV in Large-Scale Unknown Complex Environment with Deep Reinforcement Learning.

UAV Path Planning Based on Multicritic-Delayed Deep Deterministic Policy Gradient