Trajectory Design and Access Control for Air–Ground Coordinated Communications System With Multiagent Deep Reinforcement Learning

Ruijin Ding,Yadong Xu,Feifei Gao,Xuemin Shen
DOI: https://doi.org/10.1109/jiot.2021.3062091
IF: 10.6
2022-04-15
IEEE Internet of Things Journal
Abstract:Unmanned-aerial-vehicle (UAV)-assisted communications has attracted increasing attention recently. This article investigates air–ground coordinated communications system, in which trajectories of air UAV base stations (UAV-BSs) and access control of ground users (GUs) are jointly optimized. We formulated this optimization problem as a mixed cooperative–competitive game, where each GU competes for the limited resources of UAV-BSs to maximize its own throughput by accessing a suitable UAV-BS, and UAV-BSs cooperate with each other and design their trajectories to maximize the defined fair throughput to improve the total throughput and keep the GU fairness. Moreover, the action space of GUs is discrete, while that of UAV-BS is continuous. To tackle this hybrid action space issue, we transform the discrete actions into continuous action probabilities and propose a multiagent deep reinforcement learning (MADRL) approach, named air–ground probabilistic multiagent deep deterministic policy gradient (AG-PMADDPG). With well-designed rewards, AG-PMADDPG can coordinate two types of agents, UAV-BSs and GUs, to achieve their own objectives based on local observations. Simulation results demonstrate that AG-PMADDPG can outperform the benchmark algorithms in terms of throughput and fairness.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to optimize the trajectory design of unmanned aerial vehicle base stations (UAV - BSs) and the access control of ground users (GUs) in the unmanned aerial vehicle - assisted communication system to improve the overall performance of the system. Specifically, the paper focuses on how to jointly optimize the flight path of UAV - BSs and the access selection of ground users through the multi - agent deep reinforcement learning (MADRL) method in an air - ground collaborative communication system, so as to achieve high - throughput and fair communication services. The main challenges in the paper include: - **Hybrid action space**: The action space of ground users is discrete, while that of UAV - BSs is continuous, which leads to a hybrid action space problem. - **Non - convex optimization problem**: The trajectory design of UAV - BSs is a sequential optimization problem with a large number of decision variables and is non - convex, which is very difficult to solve directly. - **Multi - objective optimization**: Ground users and UAV - BSs have different optimization objectives. The goal of ground users is to maximize their own long - term throughput, while the goal of UAV - BSs is to maximize the defined fair throughput, that is, to maintain fairness among users while increasing the total throughput. To address these challenges, the paper proposes a method named AG - PMADDPG (Air - Ground Probabilistic Multi - Agent Deep Deterministic Policy Gradient), which can handle the hybrid action space problem and coordinate different types of agents (UAV - BSs and ground users) through appropriate reward design, enabling them to achieve their respective goals based on local observations.