Abstract:Targets search and detection encompasses a variety of decision problems such as coverage, surveillance, search, observing and pursuit-evasion along with others. In this paper we develop a multi-agent deep reinforcement learning (MADRL) method to coordinate a group of aerial vehicles (drones) for the purpose of locating a set of static targets in an unknown area. To that end, we have designed a realistic drone simulator that replicates the dynamics and perturbations of a real experiment, including statistical inferences taken from experimental data for its modeling. Our reinforcement learning method, which utilized this simulator for training, was able to find near-optimal policies for the drones. In contrast to other state-of-the-art MADRL methods, our method is fully decentralized during both learning and execution, can handle high-dimensional and continuous observation spaces, and does not require tuning of additional hyperparameters.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: How to coordinate a group of drones (multi - agent systems) to efficiently search and detect multiple static targets in an unknown large - scale environment. Specifically, the paper proposes a method based on multi - agent deep reinforcement learning (MADRL), enabling the drone team to complete tasks autonomously without centralized control. ### Background and Challenges of the Problem 1. **Multi - target Search and Detection in Complex Environments** - Searching and detecting targets involves multiple decision - making problems, such as coverage, surveillance, search, observation, and pursuit - evasion. - In practical applications, military and emergency response teams often need to locate missing persons or survivors in disaster scenarios. 2. **Limitations of Existing Methods** - Traditional methods usually divide the surveillance area into multiple units (such as Voronoi units) and design path - planning algorithms for each unit. - These methods require direct communication, are difficult to handle online drone failures, and cannot guarantee the optimality of the final solution. ### Solutions Proposed in the Paper 1. **Multi - agent Deep Reinforcement Learning (MADRL) Method** - A fully decentralized MADRL method, called Decentralized Advantage Actor - Critic (DA2C), is proposed. - This method is fully decentralized during both learning and execution, can handle high - dimensional continuous observation spaces, and does not require adjusting additional hyper - parameters. 2. **Design of the Simulator** - A realistic drone simulator is developed for training and evaluating reinforcement learning models. - The simulator takes into account the dynamic changes and uncertainties in real - world experiments, including statistical inferences extracted from experimental data. ### Main Contributions 1. **Decentralization** - Unlike other MADRL methods, this method does not require any communication during both learning and execution, thereby improving the robustness and adaptability of the system. 2. **Efficiency** - The experimental results show that this method can find near - optimal strategies within a relatively short training time, significantly outperforming random strategies and collision - free strategies. 3. **Scalability** - The impact of different numbers of drones and targets on task performance is studied, and it is found that increasing the number of drones can improve the success rate of target detection, but the performance improvement is limited after a certain number. ### Formula Representation To ensure the correctness and readability of the formulas, the following are some key formulas involved in the paper: - **Expected Discounted Reward of the Value Function** \[ V^\pi(s)=\mathbb{E}\left[\sum_{t = 0}^{h - 1}\gamma^tR(\vec{a}_t,s_t)\mid s,\pi\right] \] where \(V^\pi(s)\) represents the expected discounted reward starting from state \(s\) under policy \(\pi\). - **Policy Gradient Theorem** \[ \nabla_\theta J(\theta)=\mathbb{E}_{s,a\sim\pi}[Q^\pi(s,a)\nabla_\theta\log\pi_\theta(a\mid s)] \] - **Policy Gradient with Baseline** \[ \nabla_\theta J(\theta)=\mathbb{E}_{s,a\sim\pi}[(Q^\pi(s,a)-b(s))\nabla_\theta\log\pi_\theta(a\mid s)] \] where \(b(s) = V^\pi(s)\) is the baseline function. - **Loss Function** \[ L=\lambda_\pi L_\pi+\lambda_v L_v-\lambda_H\mathbb{E}_{s\sim\pi}[H(\pi(\cdot\mid s))] \]

Decentralized Reinforcement Learning for Multi-Target Search and Detection by a Team of Drones

Mean-Field Multi-Agent Reinforcement Learning for UAV Assisted Secure Data Dissemination.

Multi-Target Pursuit by a Decentralized Heterogeneous UAV Swarm using Deep Multi-Agent Reinforcement Learning

Cooperative multi-agent target searching: a deep reinforcement learning approach based on parallel hindsight experience replay

Autonomous UAV-based surveillance system for multi-target detection using reinforcement learning

Multi-Agent Reinforcement Learning for Distributed Cooperative Targets Search

Multi-UAV Cooperative Search in Multi-Layered Aerial Computing Networks: A Multi-Agent Deep Reinforcement Learning Approach

Matching combined multi-agent reinforcement learning for uav secure data dissemination

Multi-UAV Collaborative Detection Based on Reinforcement Learning.

MARLander: A Local Path Planning for Drone Swarms using Multiagent Deep Reinforcement Learning

Improving multi-target cooperative tracking guidance for UAV swarms using multi-agent reinforcement learning

Multi-target tracking for unmanned aerial vehicle swarms using deep reinforcement learning

Multi-UAV Collaborative Search and Strike based on Reinforcement Learning

Application of Deep Reinforcement Learning to UAV Swarming for Ground Surveillance

UAV Swarm Cooperative Target Search: A Multi-Agent Reinforcement Learning Approach

Deep Reinforcement Learning for Time-Critical Wilderness Search And Rescue Using Drones

Collaborative Target Search with a Visual Drone Swarm: An Adaptive Curriculum Embedded Multistage Reinforcement Learning Approach

Deep Reinforcement Learning Multi-UAV Trajectory Control for Target Tracking

Decentralized Learning Control for Multi-UAV Swarm Simultaneous Coverage and Tracking

A Two-Stage Target Search and Tracking Method for UAV Based on Deep Reinforcement Learning

Game of Drones: Multi-UAV Pursuit-Evasion Game With Online Motion Planning by Deep Reinforcement Learning