Three-Dimension Trajectory Design for Multi-UAV Wireless Network With Deep Reinforcement Learning

Wenqi Zhang,Qiang Wang,Xiao Liu,Yuanwei Liu,Yue Chen
DOI: https://doi.org/10.1109/tvt.2020.3047800
IF: 6.8
2021-01-01
IEEE Transactions on Vehicular Technology
Abstract:The effective trajectory design of multiple unmanned aerial vehicles (UAVs) is investigated for improving the capacity of the communication system. The aim is for maximizing real-time downlink capacity under the coverage constraint by reaping the mobility benefits of UAVs. The problem of three-dimension (3D) dynamic movement of UAVs under coverage constraint is formulated as a Constrained Markov Decision Process (CMDP) problem, while a constrained Deep Q-Network (cDQN) algorithm is proposed for solving the formulated problem. In the proposed cDQN model, each UAV acts as an agent to explore and learn its 3D deploying policy. The aim of the proposed cDQN model is for obtaining the maximum capacity while attempting to guarantee that all ground terminals (GTs) are covered. In order to satisfy the coverage constraint, a primal-dual method is adopted for training primal variable and dual variable (lagrangian multiplier) in turn. Furthermore, in an effort to reduce the action space of the cDQN algorithm, prior information is utilized for eliminating the invalid actions by the action filter. Experiment results demonstrate that the cDQN algorithm is capable of converging after some training steps. Additionally, the UAVs are capable of adapting the movement of GTs under the coverage constraint according to the 3D deploying policy derived from the proposed cDQN algorithm.
telecommunications,engineering, electrical & electronic,transportation science & technology
What problem does this paper attempt to address?
This paper aims to solve the problem of effective trajectory design of multiple unmanned aerial vehicles (UAVs) in wireless networks to improve the capacity of communication systems. Specifically, the goal of the paper is to maximize the real - time downlink capacity under the coverage constraint, achieving this goal by taking advantage of the mobility of UAVs. In the study, the problem of three - dimensional dynamically moving UAVs under the coverage constraint is formulated as a constrained Markov decision process (CMDP) problem, and a constrained deep Q - network (cDQN) algorithm is proposed to solve this problem. In the proposed cDQN model, each UAV acts as an agent to explore and learn its three - dimensional deployment strategy. The goal of this model is to obtain the maximum capacity while attempting to ensure that all ground terminals (GTs) are covered. To meet the coverage constraint, a primal - dual method is used to alternately train the primal variables and the dual variables (Lagrangian multipliers). In addition, to reduce the action space of the cDQN algorithm, prior information is used to eliminate invalid actions through an action filter. Experimental results show that the cDQN algorithm can converge after some training steps, and UAVs can adapt to the movement of GTs under the coverage constraint according to the three - dimensional deployment strategy derived from the proposed cDQN algorithm.