Abstract:Abstract Multi-agent multi-target search strategies can be utilized in complex scenarios such as post-disaster search and rescue by unmanned aerial vehicles. To solve the problem of fixed target and trajectory, the current multi-agent multi-target search strategies are mainly based on deep reinforcement learning (DRL). However, the training of agents by the DRL tend to be brittle due to their sensitivity to the training environment, which makes the strategies learned by the agents fall into local optima frequently, resulting in poor system robustness. Additionally, sparse rewards in DRL will lead to the problems such as difficulty in system convergence and low utilization efficiency of the sampled data. To address the problem that the robustness of the agents is weakened and the sparse rewards exist in the multi-objective search environment, we propose a MiniMax Multi-agent Deep Deterministic Policy Gradient based on the Parallel Hindsight Experience Replay (PHER-M3DDPG) algorithm, which adopts the framework of centralized training and decentralized execution in continuous action space. To enhance the system robustness, the PHER-M3DDPG algorithm employs a minimax learning architecture, which adaptively adjusts the learning strategy of agents by involving adversarial disturbances. In addition, to solve the sparse rewards problem, the PHER-M3DDPG algorithm adopts a parallel hindsight experience replay mechanism to increase the efficiency of data utilization by involving virtual learning targets and batch processing of the sampled data. Simulation results show that the PHER-M3DDPG algorithm outperforms the existing algorithms in terms of convergence speed and the task completion time in a multi-target search environment.

Generating Attentive Goals for Prioritized Hindsight Reinforcement Learning

Guided Goal Generation for Hindsight Multi-Goal Reinforcement Learning

Exploration via Hindsight Goal Generation

Combining Hindsight with Goal-enhanced Prediction for Multi-goal Reinforcement Learning

Efficient Multi-Goal Reinforcement Learning Via Value Consistency Prioritization

Addressing Hindsight Bias in Multigoal Reinforcement Learning

Quantile Regression Hindsight Experience Replay

Improvements on Hindsight Learning

Hindsight Planner.

Maximum Entropy-Regularized Multi-Goal Reinforcement Learning

Bias-reduced Multi-step Hindsight Experience Replay for Efficient Multi-goal Reinforcement Learning

MHER: Model-based Hindsight Experience Replay

AHEGC: Adaptive Hindsight Experience Replay with Goal-Amended Curiosity Module for Robot Control.

Cooperative multi-agent target searching: a deep reinforcement learning approach based on parallel hindsight experience replay

ACTRCE: Augmenting Experience via Teacher's Advice For Multi-Goal Reinforcement Learning

Consistent Experience Replay in High-Dimensional Continuous Control with Decayed Hindsights

Clustering-based Failed goal Aware Hindsight Experience Replay

Prioritized Generative Replay

Prioritized Hindsight with Dual Buffer for Meta-Reinforcement Learning

Hierarchical reinforcement learning for handling sparse rewards in multi-goal navigation

Solving Robotic Manipulation With Sparse Reward Reinforcement Learning Via Graph-Based Diversity and Proximity