Abstract:In multi-UAV networks, the downlink (DL) and uplink (UL) associations between a UAV and a user equipment (UE) is typically coupled, which restricts each UE to associate to the same UAV for both DL and UL. However, this mode may not be efficient since UAV networks can be heterogeneous (e.g., multi-tier UAV networks) and can experience high link uncertainty due to the mobility of UAVs. The introduction of full-duplex communication in a multi-UAV network further complicates the UE-UAV association. For this reason, the idea of DL-UL decoupling (DUDe) is introduced in this work, with which each UE is allowed to associate with separate UAVs for UL and DL transmissions. Besides, the UE-UAV association depends on the flight trajectory of the UAVs, which makes the DUDe design challenging. In this article, we study the joint decoupled UL-DL association and trajectory design problem for full-duplex multi-UAV networks. A joint optimization problem is formulated with the objective of maximizing the UEs’ sum-rate in both UL and DL. Since the problem is non-convex with sophisticated states and an individual UAV may not know the reward functions of other UAVs, a robust partially observable Markov decision process (POMDP) model is proposed to characterize the model uncertainty. A multi-agent deep reinforcement learning (MADRL) approach is proposed which enables each UAV to select its policy in a distributed manner. To train the actor-critic neural networks in the MADRL approach, an improved clip and count-based proximal policy optimization (PPO) algorithm is developed. In particular, a modified clip distribution is designed to deal with the hard restrictions between current and old policies, and an intrinsic reward is introduced to enhance the exploration capability. Simulation results illustrate the superiority of our proposed schemes when compared to the benchmarks. The codes are made publicly available in GitHub (https://github.com/isdai/MADRL-PPO).

Air-Ground Coordination Communication by Multi-Agent Deep Reinforcement Learning

Trajectory Design and Access Control for Air–Ground Coordinated Communications System With Multiagent Deep Reinforcement Learning

Learning to Cooperate: Application of Deep Reinforcement Learning for Online AGV Path Finding.

UAV-Enabled Secure Communications by Multi-Agent Deep Reinforcement Learning

UAV-assisted fair communications for multi-pair users: A multi-agent deep reinforcement learning method

Multi-Agent DRL for Air-to-Ground Communication Planning in UAV-Enabled IoT Networks

Multi-Agent Deep Reinforcement Learning for Secure UAV Communications

Joint UAV trajectory and communication design with heterogeneous multi-agent reinforcement learning

UAV Cooperative Air Combat Maneuvering Confrontation Based on Multi-agent Reinforcement Learning

Multi-Objective Optimization in Air-to-Air Communication System Based on Multi-Agent Deep Reinforcement Learning

Resource Allocation in UAV-D2D Networks: A Scalable Heterogeneous Multi-Agent Deep Reinforcement Learning Approach

Power Allocation Based on Multi-Agent Deep Deterministic Policy Gradient for Underwater Acoustic Communication Networks

Multi-Agent Deep Reinforcement Learning for Joint Decoupled User Association and Trajectory Design in Full-Duplex Multi-UAV Networks

Maximizing UAV Coverage in Maritime Wireless Networks: A Multiagent Reinforcement Learning Approach

UAV-enabled Collaborative Beamforming via Multi-Agent Deep Reinforcement Learning

Cooperative Multi-Agent Deep Reinforcement Learning Methods for UAV-aided Mobile Edge Computing Networks

Multiple unmanned aerial vehicle coordinated strikes against ground targets based on an improved multi-agent deep deterministic policy gradient algorithm

Multi-Agent Reinforcement Learning for Cooperative Air Transportation Services in City-Wide Autonomous Urban Air Mobility

Mean Field Deep Reinforcement Learning for Fair and Efficient UAV Control

Group-Based Deep Reinforcement Learning in Multi-UAV Confrontation

Autonomous and cooperative control of UAV cluster with multi-agent reinforcement learning