Deep Reinforcement Learning for Fresh Data Collection in UAV-assisted IoT Networks

Mengjie Yi,Xijun Wang,Juan Liu,Yan Zhang,Bo Bai
DOI: https://doi.org/10.48550/arXiv.2003.00391
2020-03-01
Abstract:Due to the flexibility and low operational cost, dispatching unmanned aerial vehicles (UAVs) to collect information from distributed sensors is expected to be a promising solution in Internet of Things (IoT), especially for time-critical applications. How to maintain the information freshness is a challenging issue. In this paper, we investigate the fresh data collection problem in UAV-assisted IoT networks. Particularly, the UAV flies towards the sensors to collect status update packets within a given duration while maintaining a non-negative residual energy. We formulate a Markov Decision Process (MDP) to find the optimal flight trajectory of the UAV and transmission scheduling of the sensors that minimizes the weighted sum of the age of information (AoI). A UAV-assisted data collection algorithm based on deep reinforcement learning (DRL) is further proposed to overcome the curse of dimensionality. Extensive simulation results demonstrate that the proposed DRL-based algorithm can significantly reduce the weighted sum of the AoI compared to other baseline algorithms.
Information Theory,Networking and Internet Architecture
What problem does this paper attempt to address?
This paper aims to solve the information freshness problem in data collection in unmanned aerial vehicle (UAV) - assisted Internet of Things (IoT) networks. Specifically, the research objective is to minimize the weighted sum of the Age of Information (AoI) of all sensors within a given time by optimizing the flight trajectory of the UAV and the data transmission scheduling of sensors, while ensuring that the remaining energy of the UAV is non - negative. ### Research Background and Problem Description With the development of UAV technology, UAVs have become a promising solution for collecting distributed sensor information in the Internet of Things due to their flexibility and low operating costs, especially in time - critical application scenarios. However, how to maintain the freshness of information is a challenge. The Age of Information (AoI) is used to measure the freshness of information, which is defined as the time since the generation of the latest received data packet. ### Main Contributions of the Paper 1. **Problem Modeling**: - The UAV departs from a starting point, flies towards sensors to collect state - update data packets, and arrives at the final destination within a given time. - The UAV needs to maintain non - negative remaining energy throughout the flight process. - The problem is modeled as a Markov decision process (MDP) within a finite - time horizon, with the goal of minimizing the weighted sum of AoI of all sensors. 2. **Algorithm Design**: - A UAV - assisted data collection algorithm based on deep reinforcement learning (DRL) is proposed to overcome the computational complexity brought by the high - dimensional state space. - The deep Q - network (DQN) is used to estimate the state - action value function, and the learning efficiency and stability are improved through experience replay and ε - greedy policy. 3. **Simulation Results**: - The simulation results show that the proposed DRL algorithm is significantly superior to other baseline algorithms, such as AoI - based algorithms and distance - based algorithms, in reducing the weighted AoI. - As the coverage radius of sensors increases, the average AoI decreases; while as the number of sensors increases, the average AoI increases because the UAV needs to fly a longer distance to collect data packets and the waiting time for each sensor to be updated becomes longer. ### Conclusion By modeling the UAV - assisted data collection problem as an MDP within a finite - time horizon and designing a DRL - based algorithm, the paper successfully solves the problem of minimizing the weighted AoI under time and energy constraints. The simulation results verify the effectiveness and superiority of the algorithm.