DDT: Dynamical Selective Dropping Threshold for Reactive Congestion Control

Hongze Zhou,Dinghuang Hu,Zejia Zhou,Guoyuan Yuan,Dezun Dong
DOI: https://doi.org/10.1145/3674399.3674412
2024-01-01
Abstract:Traditional congestion control algorithms (CCAs) frequently struggle to manage microbursts, resulting in performance degradation. Although RoCEv2 (RDMA over Converged Ethernet version 2) employs Priority Flow Control (PFC) to establish lossless networks and enhance burst tolerance, it does not guarantee low latency or high throughput due to challenges such as deadlocks, head-of-line (HoL) blocking, and congestion spreading. Consequently, investigating methods to reduce the side-effects of PFC in lossless networks or improve performance in lossy networks without PFC is critical. We introduce DDT, a mechanism designed to prevent timeouts by categorizing packets based on the impact after loss. DDT implements the Important Packet First and Important Packet Sustainability principles to prioritize important packets, ensuring their reliable transmission by actively dropping unimportant packets if necessary. DDT dynamically adjusts its selective dropping threshold based on the change ratio of the number of flows at switches, optimizing burst tolerance and reducing tail latency for short incast flows while minimizing the impact on the completion times of background flows. DDT is compatible with existing reactive window-based transport protocols. We have integrated DDT into HPCC and have demonstrated through large-scale simulations that: (1) DDT with PFC can reduce tail latency by up to 83.2% and greatly reduce the side effects of PFC on network performance; (2) DDT without PFC can mitigate timeouts and reduce tail latency by up to 83.7%.
What problem does this paper attempt to address?