FlowStar: Fast Convergence Per-Flow State Accurate Congestion Control for InfiniBand

Changyun Luo,Huaxi Gu,Lijing Zhu,Huixia Zhang
DOI: https://doi.org/10.1109/tnet.2024.3363658
2024-01-01
IEEE/ACM Transactions on Networking
Abstract:According to the latest TOP500 list, InfiniBand (IB) is the most widely used network architecture in the top 10 supercomputers. IB relies on Credit-based Flow Control (CBFC) to provide a lossless network and InfiniBand congestion control (IB CC) to relieve congestion, however, this can lead to the problem of victim flow since messages are mixed in the same queue and long-lived congestion spreading due to slow convergence. To deal with these problems, in this paper, we propose FlowStar, a fast convergence per-flow state accurate congestion control for InfiniBand. FlowStar includes two core mechanisms: 1) optimized per-flow CBFC mechanism provides flow state control to detect real congestion; and 2) rate adjustment rules make up for the mismatch between the original IB CC rate regulation and the per-hop CBFC to alleviate congestion spreading. FlowStar implements a per-flow congestion state on switches and can obtain in-flight packet information without additional parameter settings to ensure a lossless network. Evaluations show that FlowStar improves average and tail message complete time under different workloads.
telecommunications,computer science, theory & methods,engineering, electrical & electronic, hardware & architecture
What problem does this paper attempt to address?