DRL-TAL: Deep Reinforcement Learning-Based Traffic-Aware Load Balancing in Data Center Networks

Guoyong Jiang,Wenting Wei,Kun Wang,Chengding Pang,Yong Liu
DOI: https://doi.org/10.1109/globecom54140.2023.10437481
2023-01-01
Abstract:Load balancing in data center networks is crucial to effectively utilize network resources and enhance Quality of Service (QoS). Especially, the flowlet-level load balancing has been proven efficient in reducing latency and increasing throughput simultaneously. However, most existing work relying on empirical static timeout encounters performance degradation in dynamic network scenarios, due to a mismatch between the static timeout and changing traffic conditions. To address this problem, we propose a Deep Reinforcement Learning-Based Traffic-Aware Load Balancing scheme (DRL-TAL), which uses deep reinforcement learning (DRL) to update the flowlet timeout adaptively. The agent using a deep deterministic policy gradient (DDPG) algorithm continuously senses network throughput and generates the timeout threshold dynamically for the next time slot. The flowlet granularity is deployed for elephant flows to achieve a balance between throughput and disorder, where the timeout value relies on the threshold generated by the agent. Furthermore, the mice flow gets forwarded under packet granularity by selecting the port with the smallest queue length to ensure a shorter flow completion time. The results demonstrate that DRL-TAL performs impressively well in the symmetric topology, with no packet loss and minimal disorder under high load compared to the state-of-the-art schemes. Moreover, it significantly reduces flow completion time by up to 45% compared to Conga in the asymmetric topology.
What problem does this paper attempt to address?