Abstract:Abstract Multi-agent multi-target search strategies can be utilized in complex scenarios such as post-disaster search and rescue by unmanned aerial vehicles. To solve the problem of fixed target and trajectory, the current multi-agent multi-target search strategies are mainly based on deep reinforcement learning (DRL). However, the training of agents by the DRL tend to be brittle due to their sensitivity to the training environment, which makes the strategies learned by the agents fall into local optima frequently, resulting in poor system robustness. Additionally, sparse rewards in DRL will lead to the problems such as difficulty in system convergence and low utilization efficiency of the sampled data. To address the problem that the robustness of the agents is weakened and the sparse rewards exist in the multi-objective search environment, we propose a MiniMax Multi-agent Deep Deterministic Policy Gradient based on the Parallel Hindsight Experience Replay (PHER-M3DDPG) algorithm, which adopts the framework of centralized training and decentralized execution in continuous action space. To enhance the system robustness, the PHER-M3DDPG algorithm employs a minimax learning architecture, which adaptively adjusts the learning strategy of agents by involving adversarial disturbances. In addition, to solve the sparse rewards problem, the PHER-M3DDPG algorithm adopts a parallel hindsight experience replay mechanism to increase the efficiency of data utilization by involving virtual learning targets and batch processing of the sampled data. Simulation results show that the PHER-M3DDPG algorithm outperforms the existing algorithms in terms of convergence speed and the task completion time in a multi-target search environment.

Distributed Multi-agent Soft Actor-Critic Algorithm With Probabilistic Prioritized Experience Replay

Prioritized Experience Replay in Multi-Actor-Attention-Critic for Reinforcement Learning

Progressive Prioritized Experience Replay for Multi-Agent Reinforcement Learning

A Graph-Based Soft Actor Critic Approach in Multi-Agent Reinforcement Learning

Multi-Agent Deep Deterministic Policy Gradient Algorithm Based on Classification Experience Replay

Cooperative multi-agent target searching: a deep reinforcement learning approach based on parallel hindsight experience replay

Prioritized Experience Replay for Multi-agent Cooperation

A Dynamically Adaptive Approach to Reducing Strategic Interference for Multi-agent Systems

Deep reinforcement learning algorithm based on multi-agent parallelism and its application in game environment

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

The Implementation of Asynchronous Advantage Actor-Critic with Stigmergy in Network-assisted Multi-agent System.

Improving Sample Efficiency in Multi-Agent Actor-Critic Methods

AccMER: Accelerating Multi-Agent Experience Replay with Cache Locality-aware Prioritization

Data-Based Optimal Consensus Control for Multiagent Systems with Time Delays: Using Prioritized Experience Replay

MAC-PO: Multi-Agent Experience Replay via Collective Priority Optimization

Cooperative multi-agent game based on reinforcement learning

Hybrid Attention-Oriented Experience Replay for Deep Reinforcement Learning and Its Application to a Multi-Robot Cooperative Hunting Problem.

A priority experience replay actor-critic algorithm using self-attention mechanism for strategy optimization of discrete problems

Decomposed and Prioritized Experience Replay-based MADDPG Algorithm for Multi-UAV Confrontation

DDMA: Discrepancy-Driven Multi-agent Reinforcement Learning

A Multi-Agent Adaptive Co-Evolution Method in Dynamic Environments