Abstract:Exploiting unmanned aerial vehicles (UAVs) to execute tasks is gaining growing popularity recently. To solve the underlying task scheduling problem, the deep reinforcement learning (DRL) based methods demonstrate notable advantage over the conventional heuristics as they rely less on hand-engineered rules. However, their decision space will become prohibitively huge as the problem scales up, thus deteriorating the computation efficiency. To alleviate this issue, we propose a double-level deep reinforcement learning (DL-DRL) approach based on a divide and conquer framework (DCF), where we decompose the task scheduling of multi-UAV into task allocation and route planning. Particularly, we design an encoder-decoder structured policy network in our upper-level DRL model to allocate the tasks to different UAVs, and we exploit another attention based policy network in our lower-level DRL model to construct the route for each UAV, with the objective to maximize the number of executed tasks given the maximum flight distance of the UAV. To effectively train the two models, we design an interactive training strategy (ITS), which includes pre-training, intensive training and alternate training. Experimental results show that our DL-DRL performs favorably against the learning-based and conventional baselines including the OR-Tools, in terms of solution quality and computation efficiency. We also verify the generalization performance of our approach by applying it to larger sizes of up to 1000 tasks. Moreover, we also show via an ablation study that our ITS can help achieve a balance between the performance and training efficiency.
What problem does this paper attempt to address?
### Problems the paper attempts to solve
The paper aims to solve the large - scale multi - unmanned aerial vehicle (UAVs) task scheduling problem. Specifically, the paper focuses on how to maximize the number of tasks executed within the limited maximum flight distance. This problem is very important in practical applications, such as in the fields of package delivery, environmental monitoring, target tracking and reconnaissance, because the task scheduling of the UAV system directly affects the efficiency and quality of task completion.
### Background and challenges
1. **Problem complexity**:
- The multi - UAV task scheduling problem can be regarded as a variant of the multi - traveling salesman problem (m - TSP) and is an NP - hard problem. As the problem scale increases, the computation time to find the optimal solution grows exponentially.
- Traditional exact algorithms (such as the branch - and - bound method, dynamic programming, etc.) can find the optimal solution, but are not suitable for large - scale problems because the computation time is too long.
- Heuristic algorithms (such as genetic algorithms, ant colony optimization, particle swarm optimization, etc.) can achieve a better balance on large - scale problems, but require manual design of rules, which may limit the final performance.
2. **Expansion of decision space**:
- As the problem scale increases, the decision space of deep reinforcement learning (DRL) will also expand dramatically, resulting in unstable training or performance degradation.
### Solutions
To address the above challenges, the paper proposes a double - level deep reinforcement learning method (DL - DRL) based on the divide - and - conquer framework (DCF).
1. **Divide - and - conquer framework (DCF)**:
- Decompose the multi - UAV task scheduling problem into two sub - problems: task allocation and route planning.
- By decomposing the problem, the computational complexity can be effectively reduced, enabling the DRL model to handle large - scale problems more efficiently.
2. **Double - level deep reinforcement learning (DL - DRL)**:
- **Upper - level DRL model**: Responsible for task allocation, using an encoder based on the self - attention mechanism and a selection decoder to allocate tasks to different UAVs.
- **Lower - level DRL model**: Responsible for route planning, using an encoder - decoder structure based on the attention mechanism to construct the path of each UAV, with the goal of maximizing the number of tasks executed within the maximum flight distance.
3. **Interactive training strategy (ITS)**:
- It includes three stages: pre - training, intensive training and alternating training, to balance training performance and efficiency.
- Through this strategy, the upper - level and lower - level DRL models can be effectively trained, ensuring that their mutual influence is fully utilized.
### Experimental results
The experimental results show that the proposed DL - DRL method is superior to existing learning - based and traditional methods (such as OR - Tools) in terms of solution quality and computational efficiency. In addition, the effectiveness of the interactive training strategy is verified through ablation studies, and good generalization performance is demonstrated on larger - scale problem instances (up to 1,000 tasks).
### Main contributions
1. Proposed a divide - and - conquer framework (DCF) that decomposes the multi - UAV task scheduling problem into two sub - problems: task allocation and route planning.
2. Designed a double - level deep reinforcement learning method (DL - DRL) to solve these two sub - problems respectively.
3. Proposed an interactive training strategy (ITS) to balance training performance and efficiency.
4. Verified the effectiveness and generalization ability of the method through extensive experiments.
### Summary
The paper successfully solves the large - scale multi - UAV task scheduling problem through the divide - and - conquer framework and double - level deep reinforcement learning method, providing new ideas and methods for research in related fields.