Abstract:As an important part of the fifth generation (5G) mobile networks, unmanned aerial vehicles (UAVs) have been applied in various communication scenarios due to their high operability and low cost. In this paper, we investigate a multi-UAV communication system with moving users and consider the co-channel interference caused by the transmissions of all other UAVs. To ensure the fairness, we maximize the minimum average user rate during the observed time by jointly optimizing UAVs' trajectories, transmission power, and user association. Considering that UAVs can cover a large area for communications, UAVs do not need to move as soon as the users move. Therefore, a two-timescale structure is proposed for the considered scenario, where the UAVs' trajectories are optimized based on the channel state information (CSI) in a long timescale, while the transmission power and the user association are optimized based on the instantaneous CSI in a short timescale. To effectively tackle this challenging non-convex problem with both discrete and continuous variables, we propose a joint neural network (NN) design, where a deep reinforcement learning based Pointer Network named advantage pointer-critic (APC) is applied to optimize discrete variables and a deep-unfolding NN is used to optimize the continuous variables. Specifically, we first formulate a Markov decision process to model the user association, and then employ the APC network trained by the advantage actor-critic algorithm to address it. The APC network consists of a Pointer Network and a Multilayer Perceptron. As for the deep-unfolding NN, we first develop a block coordinate descent based algorithm to optimize the UAVs' trajectories and transmission power, and then unfold the algorithm into a layer-wise NN with introduced trainable parameters. These two networks are jointly trained in an unsupervised fashion. Simulation results validate that the proposed joint NN significantly outperforms the optimization algorithm with much lower complexity, and achieves good performances on scalability and generalization ability.

Proximal Policy Optimization for Multi-rotor UAV Autonomous Guidance, Tracking and Obstacle Avoidance

UAV Cooperative Search Based on Multi-agent Generative Adversarial Imitation Learning

Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance

Proximal policy optimization guidance algorithm for intercepting near-space maneuvering targets

DTPPO: Dual-Transformer Encoder-based Proximal Policy Optimization for Multi-UAV Navigation in Unseen Complex Environments

Multi-UAV autonomous collision avoidance based on PPO-GIC algorithm with CNN-LSTM fusion network

On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration

Obstacle Avoidance for UAS in Continuous Action Space Using Deep Reinforcement Learning

Muti-Agent Proximal Policy Optimization For Data Freshness in UAV-assisted Networks

Mean policy-based proximal policy optimization for maneuvering decision in multi-UAV air combat

Joint Resource Allocation and Trajectory Design for Multi-UAV Systems With Moving Users: Pointer Network and Unfolding

Multiple-UAV Reinforcement Learning Algorithm Based on Improved PPO in Ray Framework

Robotic arm trajectory tracking method based on improved proximal policy optimization

A hierarchical reinforcement learning method for missile evasion and guidance

Proximal policy optimization with reciprocal velocity obstacle based collision avoidance path planning for multi-unmanned surface vehicles

UCAV Air Combat Maneuver Decisions Based on a Proximal Policy Optimization Algorithm with Situation Reward Shaping

PPO-Exp: Keeping Fixed-Wing UAV Formation with Deep Reinforcement Learning

Multi-UAV obstacle avoidance control via multi-objective social learning pigeon-inspired optimization

End-to-end UAV Intelligent Training via Deep Reinforcement Learning

A multi-robot path-planning algorithm for autonomous navigation using meta-reinforcement learning based on transfer learning

A Graph-Based PPO Approach in Multi-UAV Navigation for Communication Coverage