Abstract:Abstract Multi-agent multi-target search strategies can be utilized in complex scenarios such as post-disaster search and rescue by unmanned aerial vehicles. To solve the problem of fixed target and trajectory, the current multi-agent multi-target search strategies are mainly based on deep reinforcement learning (DRL). However, the training of agents by the DRL tend to be brittle due to their sensitivity to the training environment, which makes the strategies learned by the agents fall into local optima frequently, resulting in poor system robustness. Additionally, sparse rewards in DRL will lead to the problems such as difficulty in system convergence and low utilization efficiency of the sampled data. To address the problem that the robustness of the agents is weakened and the sparse rewards exist in the multi-objective search environment, we propose a MiniMax Multi-agent Deep Deterministic Policy Gradient based on the Parallel Hindsight Experience Replay (PHER-M3DDPG) algorithm, which adopts the framework of centralized training and decentralized execution in continuous action space. To enhance the system robustness, the PHER-M3DDPG algorithm employs a minimax learning architecture, which adaptively adjusts the learning strategy of agents by involving adversarial disturbances. In addition, to solve the sparse rewards problem, the PHER-M3DDPG algorithm adopts a parallel hindsight experience replay mechanism to increase the efficiency of data utilization by involving virtual learning targets and batch processing of the sampled data. Simulation results show that the PHER-M3DDPG algorithm outperforms the existing algorithms in terms of convergence speed and the task completion time in a multi-target search environment.

Towards Efficient Detection and Optimal Response against Sophisticated Opponents.

Towards Efficient Detection and Optimal Response Against Sophisticated Opponents

Bayes-ToMoP: A Fast Detection and Best Response Algorithm Towards Sophisticated Opponents.

Think That Attackers Think: Using First-Order Theory of Mind in Intrusion Response System.

Adaptive algorithm for multi-agent learning optimal cooperative pursuit strategy based on Markov game

Adaptive Opponent Policy Detection in Multi-Agent MDPs: Real-Time Strategy Switch Identification Using Running Error Estimation

Detecting and Tracing Multi-Strategic Agents with Opponent Modelling and Bayesian Policy Reuse

An Improved Approach Towards Multi-Agent Pursuit–Evasion Game Decision-Making Using Deep Reinforcement Learning

Accurate policy detection and efficient knowledge reuse against multi-strategic opponents

OM-TCN: A Dynamic and Agile Opponent Modeling Approach for Competitive Games

Model-Based Opponent Modeling

A Dynamically Adaptive Approach to Reducing Strategic Interference for Multi-agent Systems

Best Response Shaping

Adversarial Decision-Making for Moving Target Defense: A Multi-Agent Markov Game and Reinforcement Learning Approach

Efficiently tracking multi-strategic opponents: A context-aware Bayesian policy reuse approach

Cooperative multi-agent target searching: a deep reinforcement learning approach based on parallel hindsight experience replay

Finding Friend and Foe in Multi-Agent Games

Opponent portrait for multiagent reinforcement learning in competitive environment

Modeling Theory of Mind in Multi-Agent Games Using Adaptive Feedback Control

Adversarial Decision Making Against Intelligent Targets in Cooperative Multiagent Systems

Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning