Abstract:Exploiting unmanned aerial vehicles (UAVs) to execute tasks is gaining growing popularity recently. To solve the underlying task scheduling problem, the deep reinforcement learning (DRL) based methods demonstrate notable advantage over the conventional heuristics as they rely less on hand-engineered rules. However, their decision space will become prohibitively huge as the problem scales up, thus deteriorating the computation efficiency. To alleviate this issue, we propose a double-level deep reinforcement learning (DL-DRL) approach based on a divide and conquer framework (DCF), where we decompose the task scheduling of multi-UAV into task allocation and route planning. Particularly, we design an encoder-decoder structured policy network in our upper-level DRL model to allocate the tasks to different UAVs, and we exploit another attention based policy network in our lower-level DRL model to construct the route for each UAV, with the objective to maximize the number of executed tasks given the maximum flight distance of the UAV. To effectively train the two models, we design an interactive training strategy (ITS), which includes pre-training, intensive training and alternate training. Experimental results show that our DL-DRL performs favorably against the learning-based and conventional baselines including the OR-Tools, in terms of solution quality and computation efficiency. We also verify the generalization performance of our approach by applying it to larger sizes of up to 1000 tasks. Moreover, we also show via an ablation study that our ITS can help achieve a balance between the performance and training efficiency.

What problem does this paper attempt to address?

### Problems the paper attempts to solve The paper aims to solve the large - scale multi - unmanned aerial vehicle (UAVs) task scheduling problem. Specifically, the paper focuses on how to maximize the number of tasks executed within the limited maximum flight distance. This problem is very important in practical applications, such as in the fields of package delivery, environmental monitoring, target tracking and reconnaissance, because the task scheduling of the UAV system directly affects the efficiency and quality of task completion. ### Background and challenges 1. **Problem complexity**: - The multi - UAV task scheduling problem can be regarded as a variant of the multi - traveling salesman problem (m - TSP) and is an NP - hard problem. As the problem scale increases, the computation time to find the optimal solution grows exponentially. - Traditional exact algorithms (such as the branch - and - bound method, dynamic programming, etc.) can find the optimal solution, but are not suitable for large - scale problems because the computation time is too long. - Heuristic algorithms (such as genetic algorithms, ant colony optimization, particle swarm optimization, etc.) can achieve a better balance on large - scale problems, but require manual design of rules, which may limit the final performance. 2. **Expansion of decision space**: - As the problem scale increases, the decision space of deep reinforcement learning (DRL) will also expand dramatically, resulting in unstable training or performance degradation. ### Solutions To address the above challenges, the paper proposes a double - level deep reinforcement learning method (DL - DRL) based on the divide - and - conquer framework (DCF). 1. **Divide - and - conquer framework (DCF)**: - Decompose the multi - UAV task scheduling problem into two sub - problems: task allocation and route planning. - By decomposing the problem, the computational complexity can be effectively reduced, enabling the DRL model to handle large - scale problems more efficiently. 2. **Double - level deep reinforcement learning (DL - DRL)**: - **Upper - level DRL model**: Responsible for task allocation, using an encoder based on the self - attention mechanism and a selection decoder to allocate tasks to different UAVs. - **Lower - level DRL model**: Responsible for route planning, using an encoder - decoder structure based on the attention mechanism to construct the path of each UAV, with the goal of maximizing the number of tasks executed within the maximum flight distance. 3. **Interactive training strategy (ITS)**: - It includes three stages: pre - training, intensive training and alternating training, to balance training performance and efficiency. - Through this strategy, the upper - level and lower - level DRL models can be effectively trained, ensuring that their mutual influence is fully utilized. ### Experimental results The experimental results show that the proposed DL - DRL method is superior to existing learning - based and traditional methods (such as OR - Tools) in terms of solution quality and computational efficiency. In addition, the effectiveness of the interactive training strategy is verified through ablation studies, and good generalization performance is demonstrated on larger - scale problem instances (up to 1,000 tasks). ### Main contributions 1. Proposed a divide - and - conquer framework (DCF) that decomposes the multi - UAV task scheduling problem into two sub - problems: task allocation and route planning. 2. Designed a double - level deep reinforcement learning method (DL - DRL) to solve these two sub - problems respectively. 3. Proposed an interactive training strategy (ITS) to balance training performance and efficiency. 4. Verified the effectiveness and generalization ability of the method through extensive experiments. ### Summary The paper successfully solves the large - scale multi - UAV task scheduling problem through the divide - and - conquer framework and double - level deep reinforcement learning method, providing new ideas and methods for research in related fields.

DL-DRL: A double-level deep reinforcement learning approach for large-scale task scheduling of multi-UAV

Deep Reinforcement Learning Approach with Multiple Experience Pools for UAV's Autonomous Motion Planning in Complex Unknown Environments

Digital Twin Assisted Task Assignment in Multi-UAV Systems: A Deep Reinforcement Learning Approach

Deep reinforcement learning for unmanned aerial vehicles cluster task allocation

Enabling Efficient Scheduling in Large-Scale UAV-Assisted Mobile-Edge Computing via Hierarchical Reinforcement Learning

Relevant experience learning: A deep reinforcement learning method for UAV autonomous motion planning in complex unknown environments

A RDA-Based Deep Reinforcement Learning Approach for Autonomous Motion Planning of UAV in Dynamic Unknown Environments

Autonomous Navigation of UAVs in Large-Scale Complex Environments: A Deep Reinforcement Learning Approach

Distributed Energy-Efficient Multi-UAV Navigation for Long-Term Communication Coverage by Deep Reinforcement Learning

Deep Reinforcement Learning for Intelligent Dual-UAV Reconnaissance Mission Planning

Deep Reinforcement Learning Enabled Multi-UAV Scheduling for Disaster Data Collection With Time-Varying Value

Unmanned aerial vehicle–human collaboration route planning for intelligent infrastructure inspection

Deep Reinforcement Learning for UAV Intelligent Mission Planning

Joint Optimization of Multi-UAV Deployment and User Association Via Deep Reinforcement Learning for Long-Term Communication Coverage

Deep Reinforcement Learning for Multi-UAVs Collaborative Task Assignment in Logistic Scenarios

Reinforcement Learning Assisted Multi-UAV Task Allocation and Path Planning for IIoT

Maximizing UAV Coverage in Maritime Wireless Networks: A Multiagent Reinforcement Learning Approach

Large-scale Power Inspection: A Deep Reinforcement Learning Approach

Multi-UAV-assisted computation offloading in DT-based networks: A distributed deep reinforcement learning approach

Multi-UAV simultaneous target assignment and path planning based on deep reinforcement learning in dynamic multiple obstacles environments

A UAV Path Planning Method Based on Deep Reinforcement Learning