Abstract:Abstract Multi-agent multi-target search strategies can be utilized in complex scenarios such as post-disaster search and rescue by unmanned aerial vehicles. To solve the problem of fixed target and trajectory, the current multi-agent multi-target search strategies are mainly based on deep reinforcement learning (DRL). However, the training of agents by the DRL tend to be brittle due to their sensitivity to the training environment, which makes the strategies learned by the agents fall into local optima frequently, resulting in poor system robustness. Additionally, sparse rewards in DRL will lead to the problems such as difficulty in system convergence and low utilization efficiency of the sampled data. To address the problem that the robustness of the agents is weakened and the sparse rewards exist in the multi-objective search environment, we propose a MiniMax Multi-agent Deep Deterministic Policy Gradient based on the Parallel Hindsight Experience Replay (PHER-M3DDPG) algorithm, which adopts the framework of centralized training and decentralized execution in continuous action space. To enhance the system robustness, the PHER-M3DDPG algorithm employs a minimax learning architecture, which adaptively adjusts the learning strategy of agents by involving adversarial disturbances. In addition, to solve the sparse rewards problem, the PHER-M3DDPG algorithm adopts a parallel hindsight experience replay mechanism to increase the efficiency of data utilization by involving virtual learning targets and batch processing of the sampled data. Simulation results show that the PHER-M3DDPG algorithm outperforms the existing algorithms in terms of convergence speed and the task completion time in a multi-target search environment.

Accurate policy detection and efficient knowledge reuse against multi-strategic opponents

Efficiently tracking multi-strategic opponents: A context-aware Bayesian policy reuse approach

Efficient Policy Detecting and Reusing for Non-Stationarity in Markov Games.

Adaptive algorithm for multi-agent learning optimal cooperative pursuit strategy based on Markov game

Adaptive Opponent Policy Detection in Multi-Agent MDPs: Real-Time Strategy Switch Identification Using Running Error Estimation

A Deep Bayesian Policy Reuse Approach Against Non-Stationary Agents.

Efficient use of heuristics for accelerating XCS-based policy learning in Markov games

Robust optimal policies for team Markov games

Selective Policy Transfer in Multi-Agent Systems with Sparse Interactions

Cooperative multi-agent target searching: a deep reinforcement learning approach based on parallel hindsight experience replay

Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning

Opponent portrait for multiagent reinforcement learning in competitive environment

Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization

Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions

A Robust Policy Bootstrapping Algorithm for Multi-objective Reinforcement Learning in Non-stationary Environments

Multi-agent Reinforcement Learning with Approximate Model Learning for Competitive Games.

MACRPO: Multi-Agent Cooperative Recurrent Policy Optimization

Robust Opponent Modeling via Adversarial Ensemble Reinforcement Learning in Asymmetric Imperfect-Information Games

Approximate Policy Iteration for Robust Stochastic Control of Multi-agent Markov Decision Processes

OMPO: A Unified Framework for RL under Policy and Dynamics Shifts

Metric Policy Representations for Opponent Modeling