Abstract:The age of information (AoI) is used to measure the freshness of the data. In IoT networks, the traditional resource management schemes rely on a message exchange between the devices and the base station (BS) before communication which causes high AoI, high energy consumption, and low reliability. Unmanned aerial vehicles (UAVs) as flying BSs have many advantages in minimizing the AoI, energy-saving, and throughput improvement. In this paper, we present a novel learning-based framework that estimates the traffic arrival of IoT devices based on Markovian events. The learning proceeds to optimize the trajectory of multiple UAVs and their scheduling policy. First, the BS predicts the future traffic of the devices. We compare two traffic predictors: the forward algorithm (FA) and the long short-term memory (LSTM). Afterward, we propose a deep reinforcement learning (DRL) approach to optimize the optimal policy of each UAV. Finally, we manipulate the optimum reward function for the proposed DRL approach. Simulation results show that the proposed algorithm outperforms the random-walk (RW) baseline model regarding the AoI, scheduling accuracy, and transmission power.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is in the Internet of Things (IoT) network, to optimize the trajectory of unmanned aerial vehicles (UAVs) and resource scheduling strategies by predicting the traffic patterns of devices, thereby reducing the information freshness metric (i.e., age of information, AoI), cumulative regret (i.e., the situation where resources are allocated to inactive devices while active devices are not served), and the average transmission power of devices. Specifically, the paper proposes a deep - learning - based framework that can estimate the traffic arrival of IoT devices according to Markov events and further optimize the trajectories of multiple UAVs and their scheduling strategies.
### Main Objectives
1. **Reduce Age of Information (AoI)**: By predicting which devices are about to become active, UAVs can give priority to serving these devices, thereby reducing their AoI.
2. **Reduce Cumulative Regret**: By accurately predicting the activity states of devices, reduce the situation of allocating resources to inactive devices, thereby reducing cumulative regret.
3. **Reduce Transmission Power**: By optimizing the trajectory and resource scheduling strategy of UAVs, reduce the transmission power requirements of devices.
### Method Overview
1. **Traffic Prediction Phase**:
- Use two methods, the forward algorithm (FA) and the long - short - term memory network (LSTM), to predict the activation probability of devices.
- Compare the performance of these two methods in terms of accuracy, complexity, etc.
2. **UAV Learning Phase**:
- Use the deep reinforcement learning (DRL) method to optimize the trajectory and scheduling strategy of each UAV.
- Design an optimal reward function to jointly minimize the average AoI, cumulative regret, and average transmission power.
### Key Contributions
- **System Model Design**: Use the hidden Markov model (HMM) to describe the activation states of devices, assuming that multiple UAVs serve the devices.
- **Traffic Prediction Method**: Propose two traffic prediction methods, FA and LSTM, and compare them from different perspectives.
- **DRL Solution**: Propose a DRL method to optimize the trajectory path and scheduling strategy of UAVs, jointly minimizing the average AoI, cumulative regret, and average transmission power.
- **Performance Evaluation**: The simulation results show that the proposed algorithm is superior to the random walk (RW) baseline model in terms of AoI, scheduling accuracy, and transmission power.
### Formula Summary
- **Channel Gain**:
\[
g_{u,BS}(t)=g_0L^{- 2}_{u,BS}=\frac{g_0}{\vert h_u - h_{BS}\vert^2+\|l_u(t)\|^2}
\]
- **Energy Consumption**:
\[
e_u(t)=\frac{E_{\text{max}}\sigma^2}{E}\left(\frac{2M}{B}-1\right)\frac{1}{g_{u,BS}(t)}
\]
- **Motion Energy Consumption**:
\[
e_F(u, v_u)=\frac{E_{\text{max}}}{E}\left(P_0\left(1+\frac{3v_u^2}{v_{\text{tip}}^2}\right)+P_1\sqrt{1+\frac{v_u^4}{4v_0^4}-\frac{v_u^2}{2v_0^2}}+\frac{1}{2v_u^3}d_0\rho\mu_0Z\right)
\]
- **Device Activation Probability**:
\[
\Pr(w_d(t) = 1\vert S(t))=1-\prod_{k = 1}^K(1 - p_{d,k})^{S_k(t)}
\]
- **Discrete AoI**:
\[
A_d(t)=\begin{cases}
1&\text{if }\alpha_d(t)=1\\
\min\{A_{\text{max}}, A_d(t - 1)+1\}&\text{otherwise}
\end{cases}
\]