Abstract:Abstract Multi-agent multi-target search strategies can be utilized in complex scenarios such as post-disaster search and rescue by unmanned aerial vehicles. To solve the problem of fixed target and trajectory, the current multi-agent multi-target search strategies are mainly based on deep reinforcement learning (DRL). However, the training of agents by the DRL tend to be brittle due to their sensitivity to the training environment, which makes the strategies learned by the agents fall into local optima frequently, resulting in poor system robustness. Additionally, sparse rewards in DRL will lead to the problems such as difficulty in system convergence and low utilization efficiency of the sampled data. To address the problem that the robustness of the agents is weakened and the sparse rewards exist in the multi-objective search environment, we propose a MiniMax Multi-agent Deep Deterministic Policy Gradient based on the Parallel Hindsight Experience Replay (PHER-M3DDPG) algorithm, which adopts the framework of centralized training and decentralized execution in continuous action space. To enhance the system robustness, the PHER-M3DDPG algorithm employs a minimax learning architecture, which adaptively adjusts the learning strategy of agents by involving adversarial disturbances. In addition, to solve the sparse rewards problem, the PHER-M3DDPG algorithm adopts a parallel hindsight experience replay mechanism to increase the efficiency of data utilization by involving virtual learning targets and batch processing of the sampled data. Simulation results show that the PHER-M3DDPG algorithm outperforms the existing algorithms in terms of convergence speed and the task completion time in a multi-target search environment.

Cooperative Encirclement Strategy for Multiple Drones Based on ATT-MADDPG

Large Scale Pursuit-Evasion under Collision Avoidance Using Deep Reinforcement Learning.

Distance-based Multiple Non-cooperative Ground Target Encirclement for Complex Environments

Multi-Target Pursuit by a Decentralized Heterogeneous UAV Swarm using Deep Multi-Agent Reinforcement Learning

Faster Target Encirclement with Utilization of Obstacles via Multi-Agent Reinforcement Learning

Multi-robot Target Encirclement Control with Collision Avoidance via Deep Reinforcement Learning

Multi-Agent Cooperation Decision-Making by Reinforcement Learning with Encirclement Rewards

Crafting a robotic swarm pursuit–evasion capture strategy using deep reinforcement learning

Cooperative Pursuit with Multiple Pursuers based on Deep Minimax Q-learning

Game of Drones: Multi-UAV Pursuit-Evasion Game With Online Motion Planning by Deep Reinforcement Learning

Cooperative multi-agent target searching: a deep reinforcement learning approach based on parallel hindsight experience replay

A Reinforcement Learning-based Decentralized Method of Avoiding Multi-UAV Collision in 3-D Airspace

UAV Cooperative Air Combat Maneuvering Confrontation Based on Multi-agent Reinforcement Learning

Learning Multi-Pursuit Evasion for Safe Targeted Navigation of Drones

Deep Reinforcement Learning-Driven Collaborative Rounding-Up for Multiple Unmanned Aerial Vehicles in Obstacle Environments

Multiple noncooperative targets encirclement by relative distance-based positioning and neural antisynchronization control

A Cooperative-Competitive Strategy for Autonomous Multidrone Racing

Multi-UAV DMPC Cooperative Guidance with Constraints of Terminal Angle and Obstacle Avoidance

Multiple unmanned aerial vehicle coordinated strikes against ground targets based on an improved multi-agent deep deterministic policy gradient algorithm

Collision-Avoiding Flocking With Multiple Fixed-Wing UAVs in Obstacle-Cluttered Environments: A Task-Specific Curriculum- Based MADRL Approach

UAVs rounding up inspired by communication multi-agent depth deterministic policy gradient