AoI optimal UAV trajectory planning: A Deep Recurrent Reinforcement Learning Approach

Mengjie Wu,Huijia Chi,Shuying Gan,Xijun Wang,Chao Xu
DOI: https://doi.org/10.1109/PIMRC50174.2021.9569429
2021-01-01
Abstract:In this paper, we consider an unmanned aerial vehicles (UAV)-assisted IoT network and study the trajectory planning problem to optimize the information freshness, in terms of age of information (AoI), where the update arrivals at IoT devices are stochastic and are not known to the UAV. To this end, we first formulate the dynamic UAV trajectory planning problem as a Partially Observable Markov Decision Process (POMDP) with non-uniform time steps, where the set of valid actions is coupled with the agent's observations. Then, a deep recurrent reinforcement learning (DRRL) algorithm is devised to find the policy minimizing the expectation of the weighted average AoI, in which a modified discount mechanism is utilized to deal with the challenge from non-uniform time steps and an action elimination mechanism is introduced to address the coupling between the valid actions and observations. Finally, simulations are conducted to validate the effectiveness of our proposed algorithm by comparing it with baseline strategies.
What problem does this paper attempt to address?