BULB: Lightweight and Automated Load Balancing for Fast Datacenter Networks

Yuan Liu,Wenxin Li,Wenyu Qu,Heng Qi
DOI: https://doi.org/10.1145/3545008.3545021
2022-01-01
Abstract:Load balancing is essential for datacenter networks. However, prior solutions have significant limitations: they either are oblivious to congestion or involve a daunting and time-consuming parameter-tunning task over their heuristics for achieving good performance. Thus, we ask: is it possible to learn to balance datacenter traffic? While deep reinforcement learning (DRL) sounds like a good answer, we observe that it is too heavyweight due to the long decision-making latency. Therefore, we introduce BULB, a lightweight and automated datacenter load balancer. BULB learns link weights to guide the end-hosts to spread traffic, so as to free the central agent from quick flow-level decision-making. BULB offline trains a DRL agent for optimizing link weights but employs an imitation learning based approach to faithfully translate this agent's DNN to a decision tree for online deployment. We implement a BULB prototype with a popular machine learning framework and evaluate it extensively in ns-3. The results show that BULB achieves up to 36.6%/56.4%, 19.9%/42.5%, 35.9%/54.8%, and 45.1%/67.7% better average/tail flow completion time than ECMP, CONGA, LetFlow, and Hermes, respectively. Moreover, BULB reduces the decision latency by 175 times while incurring only 2% performance loss after converting the DNN into a decision tree.
What problem does this paper attempt to address?