Abstract:Abstract Multi-agent multi-target search strategies can be utilized in complex scenarios such as post-disaster search and rescue by unmanned aerial vehicles. To solve the problem of fixed target and trajectory, the current multi-agent multi-target search strategies are mainly based on deep reinforcement learning (DRL). However, the training of agents by the DRL tend to be brittle due to their sensitivity to the training environment, which makes the strategies learned by the agents fall into local optima frequently, resulting in poor system robustness. Additionally, sparse rewards in DRL will lead to the problems such as difficulty in system convergence and low utilization efficiency of the sampled data. To address the problem that the robustness of the agents is weakened and the sparse rewards exist in the multi-objective search environment, we propose a MiniMax Multi-agent Deep Deterministic Policy Gradient based on the Parallel Hindsight Experience Replay (PHER-M3DDPG) algorithm, which adopts the framework of centralized training and decentralized execution in continuous action space. To enhance the system robustness, the PHER-M3DDPG algorithm employs a minimax learning architecture, which adaptively adjusts the learning strategy of agents by involving adversarial disturbances. In addition, to solve the sparse rewards problem, the PHER-M3DDPG algorithm adopts a parallel hindsight experience replay mechanism to increase the efficiency of data utilization by involving virtual learning targets and batch processing of the sampled data. Simulation results show that the PHER-M3DDPG algorithm outperforms the existing algorithms in terms of convergence speed and the task completion time in a multi-target search environment.

Multi-objective Sensor Management Method Based on Twin Delayed Deep Deterministic policy gradient algorithm

A Sensor Management Algorithm Based on Deviation Matrix Control

Satellite Attitude Tracking Decision Method based on Deep Deterministic Policy Gradient for Moving Target Observation

Decomposed POMDP Optimization-Based Sensor Management for Multi-Target Tracking in Passive Multi-Sensor Systems

Stochastic Steepest-Descent Optimization Of Multiple-Objective Mobile Sensor Coverage

Multi-Sensor Management for Multi-Target Tracking Using Mutual Information

Satellite Attitude Tracking Control of Moving Targets Combining Deep Reinforcement Learning and Predefined-time Stability Considering Energy Optimization

Cooperative multi-agent target searching: a deep reinforcement learning approach based on parallel hindsight experience replay

Evolutionary Computational Intelligence-Based Multi-Objective Sensor Management for Multi-Target Tracking

Distributed Multi-Sensor Control for Multi-Target Tracking With a Sparsity-Promoting Objective Function

Multi-Sensor Control for Multi-Object Bayes Filters

Adaptive Sensor Scheduling Algorithm for Target Tracking in Wireless Sensor Networks

Multisensor Management Method for Ground Moving Target Tracking Based on Doppler Blind Zone Information

Airborne Self-adaptive Multi-sensor Management.

A Novel Sensor Scheduling Algorithm Based on Deep Reinforcement Learning for Bearing-Only Target Tracking in UWSNs

Optimal Policies Search for Sensor Management

An efficient multi-objective optimization approach for sensor management via multi-Bernoulli filtering

An end-to-end sensor scheduling method based on D3QN for underwater passive tracking in UWSNs

Multi-Objective Optimization Based Multi-Bernoulli Sensor Selection for Multi-Target Tracking

Sensor Management with Dynamic Clustering for Bearings-Only Multi-Target Tracking via Swarm Intelligence Optimization

Multisensor Management Algorithm for Airborne Sensors Using Frank-Wolfe Method