DCQCN Advanced (DCQCN-A) : Combining ECN and RTT for RDMA Congestion Control

Yongrui Hu,Zheng Shi,Yang Nie,Liguo Qian
DOI: https://doi.org/10.1109/itnec52019.2021.9586872
2021-01-01
Abstract:Since DCQCN was proposed in 2015, it has gradually become a common congestion control solution for remote direct memory access (RDMA), which is rapidly becoming a basic feature of high-speed clusters and data center networks. However, DCQCN has performance problems in the face of the ultralow latency and high bandwidth requirements of large data centers. As an explicit congestion notification (ECN) dominated congestion control (CC) algorithm, DCQCN is prone to queuing and causing delay and even packet loss in large-scale communications. Therefore, this paper proposes DCQCN-A, an algorithm that combines ECN and Round-Trip Time (RTT) to improve congestion control capabilities. In simulation, DCQCN-A can handle incast congestion of at least 2048 flows, which is 4 times more than that of DCQCN. The performance in microbenchmarks shows the convergence, fairness and adaptability of DCQCN-A. As for realistic loads, the average latency of DCQCN-A in Websearch load is 13.2µs, and it is 45% lower than DCQCN. In FB_Hadoop, DCQCN-A has a 15% lower average latency than DCQCN.
What problem does this paper attempt to address?