Muti-Agent Proximal Policy Optimization For Data Freshness in UAV-assisted Networks

Mouhamed Naby Ndiaye,El Houcine Bergou,Hajar El Hammouti
2023-03-15
Abstract:Unmanned aerial vehicles (UAVs) are seen as a promising technology to perform a wide range of tasks in wireless communication networks. In this work, we consider the deployment of a group of UAVs to collect the data generated by IoT devices. Specifically, we focus on the case where the collected data is time-sensitive, and it is critical to maintain its timeliness. Our objective is to optimally design the UAVs' trajectories and the subsets of visited IoT devices such as the global Age-of-Updates (AoU) is minimized. To this end, we formulate the studied problem as a mixed-integer nonlinear programming (MINLP) under time and quality of service constraints. To efficiently solve the resulting optimization problem, we investigate the cooperative Multi-Agent Reinforcement Learning (MARL) framework and propose an RL approach based on the popular on-policy Reinforcement Learning (RL) algorithm: Policy Proximal Optimization (PPO). Our approach leverages the centralized training decentralized execution (CTDE) framework where the UAVs learn their optimal policies while training a centralized value function. Our simulation results show that the proposed MAPPO approach reduces the global AoU by at least a factor of 1/2 compared to conventional off-policy reinforcement learning approaches.
Optimization and Control,Machine Learning
What problem does this paper attempt to address?
The paper primarily focuses on the issue of how to optimize the trajectory design of Unmanned Aerial Vehicles (UAVs) and the subset of Internet of Things (IoT) devices accessed in UAV-assisted networks to minimize the global Age-of-Updates (AoU). Specifically, the research concentrates on handling time-sensitive data collection scenarios where the freshness of data is crucial. The paper models this problem as a Mixed Integer Nonlinear Programming (MINLP) problem and solves it under time and Quality of Service (QoS) constraints. To tackle this complex issue, the authors employ a Multi-Agent Reinforcement Learning (MARL) framework, specifically proposing a Multi-Agent Proximal Policy Optimization (MAPPO) algorithm based on the policy gradient method. This algorithm leverages the Centralized Training with Decentralized Execution (CTDE) paradigm, where UAVs learn optimal strategies based on local observations while sharing a global value function, thus facilitating collaborative decision-making. Compared to traditional value-based reinforcement learning algorithms, the proposed MAPPO-AoU method is better equipped to handle continuous action spaces, achieves faster convergence, and demonstrates superior performance to benchmark algorithms in simulation results, especially in reducing global AoU and increasing the number of serviced IoT devices. Furthermore, experimental results also indicate that MAPPO-AoU enables UAVs to discover effective strategies to reduce the latency of data updates, maintaining data freshness even in the presence of IoT devices with high data generation rates in the network. In summary, the paper provides a new perspective and technical means for optimizing data collection efficiency and data freshness in UAV-assisted networks by introducing the AoU metric, constructing an MINLP model, and proposing an innovative multi-agent reinforcement learning solution.