FASTFLOW: Flexible Adaptive Congestion Control for High-Performance Datacenters
Tommaso Bonato,Abdul Kabbani,Daniele De Sensi,Rong Pan,Yanfang Le,Costin Raiciu,Mark Handley,Timo Schneider,Nils Blach,Ahmad Ghalayini,Daniel Alves,Michael Papamichael,Adrian Caulfield,Torsten Hoefler
2024-09-21
Abstract:The increasing demand of machine learning (ML) workloads in datacenters places significant stress on current congestion control (CC) algorithms, many of which struggle to maintain performance at scale. These workloads generate bursty, synchronized traffic that requires both rapid response and fairness across flows. Unfortunately, existing CC algorithms that rely heavily on delay as a primary congestion signal often fail to react quickly enough and do not consistently ensure fairness. In this paper, we propose FASTFLOW, a streamlined sender-based CC algorithm that integrates delay, ECN signals, and optional packet trimming to achieve precise, real-time adjustments to congestion windows. Central to FASTFLOW is the QuickAdapt mechanism, which provides accurate bandwidth estimation at the receiver, enabling faster reactions to network conditions. We also show that FASTFLOW can effectively enhance receiver-based algorithms such as EQDS by improving their ability to manage in-network congestion. Our evaluation reveals that FASTFLOW outperforms cutting-edge solutions, including EQDS, Swift, BBR, and MPRDMA, delivering up to 50% performance improvements in modern datacenter networks.
Networking and Internet Architecture