Multi-User Delay-Constrained Scheduling With Deep Recurrent Reinforcement Learning
Pihe Hu,Yu Chen,Ling Pan,Zhixuan Fang,Fu Xiao,Longbo Huang
DOI: https://doi.org/10.1109/tnet.2024.3359911
2024-01-01
IEEE/ACM Transactions on Networking
Abstract:Multi-user delay-constrained scheduling is a crucial challenge in various real-world applications, such as wireless communication, live streaming, and cloud computing. The scheduler must make real-time decisions to guarantee both delay and resource constraints simultaneously, without prior information on system dynamics that can be time-varying and challenging to estimate. Additionally, many practical scenarios suffer from partial observability issues due to sensing noise or hidden correlation. To address these challenges, we propose a deep reinforcement learning (DRL) algorithm called Recurrent Softmax Delayed Deep Double Deterministic Policy Gradient ( $\mathtt{RSD4}$ ) (https://github.com/hupihe/RSD4), which is a data-driven method based on a Partially Observed Markov Decision Process (POMDP) formulation. $\mathtt{RSD4}$ guarantees resource and delay constraints by Lagrangian dual and delay-sensitive queues, respectively. It also efficiently handles partial observability with a memory mechanism enabled by the recurrent neural network (RNN). Moreover, it introduces user-level decomposition and node-level merging to support large-scale multihop scenarios. Extensive experiments on simulated and real-world datasets demonstrate that $\mathtt{RSD4}$ is robust to system dynamics and partially observable environments and achieves superior performance over existing methods.
telecommunications,computer science, theory & methods,engineering, electrical & electronic, hardware & architecture