Muti-Agent Proximal Policy Optimization For Data Freshness in UAV-assisted Networks

Mouhamed Naby Ndiaye,El Houcine Bergou,Hajar El Hammouti

2023-03-15

Abstract:Unmanned aerial vehicles (UAVs) are seen as a promising technology to perform a wide range of tasks in wireless communication networks. In this work, we consider the deployment of a group of UAVs to collect the data generated by IoT devices. Specifically, we focus on the case where the collected data is time-sensitive, and it is critical to maintain its timeliness. Our objective is to optimally design the UAVs' trajectories and the subsets of visited IoT devices such as the global Age-of-Updates (AoU) is minimized. To this end, we formulate the studied problem as a mixed-integer nonlinear programming (MINLP) under time and quality of service constraints. To efficiently solve the resulting optimization problem, we investigate the cooperative Multi-Agent Reinforcement Learning (MARL) framework and propose an RL approach based on the popular on-policy Reinforcement Learning (RL) algorithm: Policy Proximal Optimization (PPO). Our approach leverages the centralized training decentralized execution (CTDE) framework where the UAVs learn their optimal policies while training a centralized value function. Our simulation results show that the proposed MAPPO approach reduces the global AoU by at least a factor of 1/2 compared to conventional off-policy reinforcement learning approaches.

Optimization and Control,Machine Learning

What problem does this paper attempt to address?

The paper primarily focuses on the issue of how to optimize the trajectory design of Unmanned Aerial Vehicles (UAVs) and the subset of Internet of Things (IoT) devices accessed in UAV-assisted networks to minimize the global Age-of-Updates (AoU). Specifically, the research concentrates on handling time-sensitive data collection scenarios where the freshness of data is crucial. The paper models this problem as a Mixed Integer Nonlinear Programming (MINLP) problem and solves it under time and Quality of Service (QoS) constraints. To tackle this complex issue, the authors employ a Multi-Agent Reinforcement Learning (MARL) framework, specifically proposing a Multi-Agent Proximal Policy Optimization (MAPPO) algorithm based on the policy gradient method. This algorithm leverages the Centralized Training with Decentralized Execution (CTDE) paradigm, where UAVs learn optimal strategies based on local observations while sharing a global value function, thus facilitating collaborative decision-making. Compared to traditional value-based reinforcement learning algorithms, the proposed MAPPO-AoU method is better equipped to handle continuous action spaces, achieves faster convergence, and demonstrates superior performance to benchmark algorithms in simulation results, especially in reducing global AoU and increasing the number of serviced IoT devices. Furthermore, experimental results also indicate that MAPPO-AoU enables UAVs to discover effective strategies to reduce the latency of data updates, maintaining data freshness even in the presence of IoT devices with high data generation rates in the network. In summary, the paper provides a new perspective and technical means for optimizing data collection efficiency and data freshness in UAV-assisted networks by introducing the AoU metric, constructing an MINLP model, and proposing an innovative multi-agent reinforcement learning solution.

Muti-Agent Proximal Policy Optimization For Data Freshness in UAV-assisted Networks

DTPPO: Dual-Transformer Encoder-based Proximal Policy Optimization for Multi-UAV Navigation in Unseen Complex Environments

A Graph-Based PPO Approach in Multi-UAV Navigation for Communication Coverage

Age of Information Minimization using Multi-agent UAVs based on AI-Enhanced Mean Field Resource Allocation

On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration

Mean policy-based proximal policy optimization for maneuvering decision in multi-UAV air combat

Proximal Policy Optimization for Multi-rotor UAV Autonomous Guidance, Tracking and Obstacle Avoidance

[Development of specific immunotherapy technics in immediate hypersensitivity].

Optimization for Master-UAV-powered Auxiliary-Aerial-IRS-assisted IoT Networks: An Option-based Multi-agent Hierarchical Deep Reinforcement Learning Approach

Dense Multi-Agent Reinforcement Learning Aided Multi-UAV Information Coverage for Vehicular Networks

Multi-Agent DRL for Air-to-Ground Communication Planning in UAV-Enabled IoT Networks

Multi-UAV Path Learning for Age and Power Optimization in IoT With UAV Battery Recharge

Cooperative Data Collection with Multiple UAVs for Information Freshness in the Internet of Things

An Improved PPO for Multiple Unmanned Aerial Vehicles

UAV-assisted fair communications for multi-pair users: A multi-agent deep reinforcement learning method

Deep Reinforcement Learning-Driven UAV Data Collection Path Planning: A Study on Minimizing AoI

Maximizing UAV Coverage in Maritime Wireless Networks: A Multiagent Reinforcement Learning Approach

Multi-Agent Deep Reinforcement Learning for Joint Decoupled User Association and Trajectory Design in Full-Duplex Multi-UAV Networks

Age-of-Updates Optimization for UAV-assisted Networks

Multiple-UAV Reinforcement Learning Algorithm Based on Improved PPO in Ray Framework