Abstract:Abstract Multi-agent multi-target search strategies can be utilized in complex scenarios such as post-disaster search and rescue by unmanned aerial vehicles. To solve the problem of fixed target and trajectory, the current multi-agent multi-target search strategies are mainly based on deep reinforcement learning (DRL). However, the training of agents by the DRL tend to be brittle due to their sensitivity to the training environment, which makes the strategies learned by the agents fall into local optima frequently, resulting in poor system robustness. Additionally, sparse rewards in DRL will lead to the problems such as difficulty in system convergence and low utilization efficiency of the sampled data. To address the problem that the robustness of the agents is weakened and the sparse rewards exist in the multi-objective search environment, we propose a MiniMax Multi-agent Deep Deterministic Policy Gradient based on the Parallel Hindsight Experience Replay (PHER-M3DDPG) algorithm, which adopts the framework of centralized training and decentralized execution in continuous action space. To enhance the system robustness, the PHER-M3DDPG algorithm employs a minimax learning architecture, which adaptively adjusts the learning strategy of agents by involving adversarial disturbances. In addition, to solve the sparse rewards problem, the PHER-M3DDPG algorithm adopts a parallel hindsight experience replay mechanism to increase the efficiency of data utilization by involving virtual learning targets and batch processing of the sampled data. Simulation results show that the PHER-M3DDPG algorithm outperforms the existing algorithms in terms of convergence speed and the task completion time in a multi-target search environment.

Selective Policy Transfer in Multi-Agent Systems with Sparse Interactions

Multi-Agent Policy Transfer Via Task Relationship Modeling.

Learning in Multi-Agent Systems with Sparse Interactions by Knowledge Transfer and Game Abstraction

Cooperative multi-agent target searching: a deep reinforcement learning approach based on parallel hindsight experience replay

ATA-MAOPT: Multi-Agent Online Policy Transfer Using Attention Mechanism With Time Abstraction

Value Function Transfer for Deep Multi-Agent Reinforcement Learning Based on N-Step Returns.

A Versatile Agent for Fast Learning from Human Instructors

Off-Agent Trust Region Policy Optimization

Accurate policy detection and efficient knowledge reuse against multi-strategic opponents

Stochastic Ensemble Policy Transfer

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

Enabling Inter-Agent Transfer for Multi-Agent Learning System by Incorporating Role Reversal

Research on Isomorphic Task Transfer Algorithm Based on Knowledge Distillation in Multi-Agent Collaborative Systems

Preference-based experience sharing scheme for multi-agent reinforcement learning in multi-target environments

Efficient Policy Detecting and Reusing for Non-Stationarity in Markov Games.

SPD: Synergy Pattern Diversifying Oriented Unsupervised Multi-agent Reinforcement Learning.

Parallel Knowledge Transfer in Multi-Agent Reinforcement Learning

IOB: Integrating Optimization Transfer and Behavior Transfer for Multi-Policy Reuse

Transferring Multi-Agent Reinforcement Learning Policies for Autonomous Driving using Sim-to-Real

Protective Policy Transfer

Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning