Abstract:Abstract Multi-agent multi-target search strategies can be utilized in complex scenarios such as post-disaster search and rescue by unmanned aerial vehicles. To solve the problem of fixed target and trajectory, the current multi-agent multi-target search strategies are mainly based on deep reinforcement learning (DRL). However, the training of agents by the DRL tend to be brittle due to their sensitivity to the training environment, which makes the strategies learned by the agents fall into local optima frequently, resulting in poor system robustness. Additionally, sparse rewards in DRL will lead to the problems such as difficulty in system convergence and low utilization efficiency of the sampled data. To address the problem that the robustness of the agents is weakened and the sparse rewards exist in the multi-objective search environment, we propose a MiniMax Multi-agent Deep Deterministic Policy Gradient based on the Parallel Hindsight Experience Replay (PHER-M3DDPG) algorithm, which adopts the framework of centralized training and decentralized execution in continuous action space. To enhance the system robustness, the PHER-M3DDPG algorithm employs a minimax learning architecture, which adaptively adjusts the learning strategy of agents by involving adversarial disturbances. In addition, to solve the sparse rewards problem, the PHER-M3DDPG algorithm adopts a parallel hindsight experience replay mechanism to increase the efficiency of data utilization by involving virtual learning targets and batch processing of the sampled data. Simulation results show that the PHER-M3DDPG algorithm outperforms the existing algorithms in terms of convergence speed and the task completion time in a multi-target search environment.

Multi-agent Reinforcement Learning for a Special Formation Problem

Moving Forward in Formation: A Decentralized Hierarchical Learning Approach to Multi-Agent Moving Together

Reinforcement learning for multi-agent formation navigation with scalability

Sim-real joint experimental verification for an unmanned surface vehicle formation strategy based on multi-agent deterministic policy gradient and line of sight guidance

Mapless Collaborative Navigation for a Multi-Robot System Based on the Deep Reinforcement Learning

Underwater Multi-agent Cooperative Formation Hunting Based on Deep Reinforcement Learning

A Method of UAV Formation Transformation Based on Reinforcement Learning Multi-agent

Multi-Agent Confrontation Game Based on Multi-Agent Reinforcement Learning

Obstacle Avoidance in Multi-Agent Formation Process Based on Deep Reinforcement Learning

Multi-UAV Behavior-based Formation with Static and Dynamic Obstacles Avoidance via Reinforcement Learning

Cooperative multi-agent target searching: a deep reinforcement learning approach based on parallel hindsight experience replay

The crowd cooperation approach for formation maintenance and collision avoidance using multi-agent deep reinforcement learning

Multi-Agent Reinforcement Learning for Problems with Combined Individual and Team Reward

Leader-follower Formation Control for a Multi-missile System Via Deep Reinforcement Learning

A Multi-agent Formation Control Algorithm in Interference Environment

Relative Distributed Formation and Obstacle Avoidance with Multi-agent Reinforcement Learning

Autonomous and cooperative control of UAV cluster with multi-agent reinforcement learning

A Deep Reinforcement Learning-Based Method Applied for Solving Multi-Agent Defense and Attack Problems.

UAV Swarm Confrontation Based on Multi-agent Deep Reinforcement Learning

Group-Based Deep Reinforcement Learning in Multi-UAV Confrontation

Distributed deep reinforcement learning based on bi-objective framework for multi-robot formation