Abstract:The age of information (AoI) is used to measure the freshness of the data. In IoT networks, the traditional resource management schemes rely on a message exchange between the devices and the base station (BS) before communication which causes high AoI, high energy consumption, and low reliability. Unmanned aerial vehicles (UAVs) as flying BSs have many advantages in minimizing the AoI, energy-saving, and throughput improvement. In this paper, we present a novel learning-based framework that estimates the traffic arrival of IoT devices based on Markovian events. The learning proceeds to optimize the trajectory of multiple UAVs and their scheduling policy. First, the BS predicts the future traffic of the devices. We compare two traffic predictors: the forward algorithm (FA) and the long short-term memory (LSTM). Afterward, we propose a deep reinforcement learning (DRL) approach to optimize the optimal policy of each UAV. Finally, we manipulate the optimum reward function for the proposed DRL approach. Simulation results show that the proposed algorithm outperforms the random-walk (RW) baseline model regarding the AoI, scheduling accuracy, and transmission power.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is in the Internet of Things (IoT) network, to optimize the trajectory of unmanned aerial vehicles (UAVs) and resource scheduling strategies by predicting the traffic patterns of devices, thereby reducing the information freshness metric (i.e., age of information, AoI), cumulative regret (i.e., the situation where resources are allocated to inactive devices while active devices are not served), and the average transmission power of devices. Specifically, the paper proposes a deep - learning - based framework that can estimate the traffic arrival of IoT devices according to Markov events and further optimize the trajectories of multiple UAVs and their scheduling strategies. ### Main Objectives 1. **Reduce Age of Information (AoI)**: By predicting which devices are about to become active, UAVs can give priority to serving these devices, thereby reducing their AoI. 2. **Reduce Cumulative Regret**: By accurately predicting the activity states of devices, reduce the situation of allocating resources to inactive devices, thereby reducing cumulative regret. 3. **Reduce Transmission Power**: By optimizing the trajectory and resource scheduling strategy of UAVs, reduce the transmission power requirements of devices. ### Method Overview 1. **Traffic Prediction Phase**: - Use two methods, the forward algorithm (FA) and the long - short - term memory network (LSTM), to predict the activation probability of devices. - Compare the performance of these two methods in terms of accuracy, complexity, etc. 2. **UAV Learning Phase**: - Use the deep reinforcement learning (DRL) method to optimize the trajectory and scheduling strategy of each UAV. - Design an optimal reward function to jointly minimize the average AoI, cumulative regret, and average transmission power. ### Key Contributions - **System Model Design**: Use the hidden Markov model (HMM) to describe the activation states of devices, assuming that multiple UAVs serve the devices. - **Traffic Prediction Method**: Propose two traffic prediction methods, FA and LSTM, and compare them from different perspectives. - **DRL Solution**: Propose a DRL method to optimize the trajectory path and scheduling strategy of UAVs, jointly minimizing the average AoI, cumulative regret, and average transmission power. - **Performance Evaluation**: The simulation results show that the proposed algorithm is superior to the random walk (RW) baseline model in terms of AoI, scheduling accuracy, and transmission power. ### Formula Summary - **Channel Gain**: \[ g_{u,BS}(t)=g_0L^{- 2}_{u,BS}=\frac{g_0}{\vert h_u - h_{BS}\vert^2+\|l_u(t)\|^2} \] - **Energy Consumption**: \[ e_u(t)=\frac{E_{\text{max}}\sigma^2}{E}\left(\frac{2M}{B}-1\right)\frac{1}{g_{u,BS}(t)} \] - **Motion Energy Consumption**: \[ e_F(u, v_u)=\frac{E_{\text{max}}}{E}\left(P_0\left(1+\frac{3v_u^2}{v_{\text{tip}}^2}\right)+P_1\sqrt{1+\frac{v_u^4}{4v_0^4}-\frac{v_u^2}{2v_0^2}}+\frac{1}{2v_u^3}d_0\rho\mu_0Z\right) \] - **Device Activation Probability**: \[ \Pr(w_d(t) = 1\vert S(t))=1-\prod_{k = 1}^K(1 - p_{d,k})^{S_k(t)} \] - **Discrete AoI**: \[ A_d(t)=\begin{cases} 1&\text{if }\alpha_d(t)=1\\ \min\{A_{\text{max}}, A_d(t - 1)+1\}&\text{otherwise} \end{cases} \]

Traffic Learning and Proactive UAV Trajectory Planning for Data Uplink in Markovian IoT Models

A Learning-Based Trajectory Planning of Multiple UAVs for AoI Minimization in IoT Networks

Model-aided Deep Reinforcement Learning for Sample-efficient UAV Trajectory Design in IoT Networks

Trajectory Design for UAV-Based Internet of Things Data Collection: A Deep Reinforcement Learning Approach

Multi-UAV Path Learning for Age and Power Optimization in IoT With UAV Battery Recharge

Joint Cluster Head Selection and Trajectory Planning in UAV-Aided IoT Networks by Reinforcement Learning with Sequential Model

DRL-UTPS: DRL-Based Trajectory Planning for Unmanned Aerial Vehicles for Data Collection in Dynamic IoT Network

UAV-Aided Lifelong Learning for AoI and Energy Optimization in Non-Stationary IoT Networks

Reinforcement Learning-Based Collision Avoidance and Optimal Trajectory Planning in UAV Communication Networks

Priority-Oriented Trajectory Planning for UAV-Aided Time-Sensitive IoT Networks

3D UAV Trajectory and Data Collection Optimisation via Deep Reinforcement Learning

Learning-Based UAV Path Planning for Data Collection with Integrated Collision Avoidance

Trajectory Planning for UAV-Assisted Data Collection in IoT Network: A Double Deep Q Network Approach

Deep Reinforcement Learning for Joint Trajectory Planning, Transmission Scheduling, and Access Control in UAV-Assisted Wireless Sensor Networks

The UAV Trajectory Optimization for Data Collection from Time-Constrained IoT Devices: A Hierarchical Deep Q-Network Approach

Deep Reinforcement Learning for Aerial Data Collection in Hybrid-Powered NOMA-IoT Networks

Trajectory Design and Resource Allocation for Multi-UAV Networks: Deep Reinforcement Learning Approaches

Deep Reinforcement Learning for Channel and Power Allocation in UAV-enabled IoT Systems

UAV Path Planning for Wireless Data Harvesting: A Deep Reinforcement Learning Approach

Joint Trajectory and Scheduling Optimization for Age of Synchronization Minimization in UAV-Assisted Networks with Random Updates

Deep Reinforcement Learning for Fresh Data Collection in UAV-assisted IoT Networks