Explicit Dropping Notification in Data Centers

Qingkai Meng,Yiran Zhang,Chaolei Hu,Bo Wang,Fengyuan Ren
DOI: https://doi.org/10.1109/infocom52122.2024.10621312
2024-01-01
Abstract:Datacenter applications increasingly demand microsecond-scale latency and tight tail latency. Despite recent advances in datacenter transport protocols, we notice that the timeout caused by packet loss is the killer of microsecond-scale latency. Moreover, refining the RTO setting is impractical due to the significant fluctuations in RTT. In this paper, we propose explicit dropping notification (EDN) to avoid timeouts. EDN rekindles ICMP Source Quench, where the switch notifies the source of precise packet loss information. Then the source can rapidly pinpoint dropped packets for fast retransmission instead of waiting for timeouts. More importantly, fast retransmission does not mean immediate retransmission which is prone to aggravate congestion and deteriorate latency. In light of this, we suggest finessing the timing and sending rate of retransmission. Specifically, as a reward of the paradigm shift to explicit notification, the source can pause for the queue draining time piggybacked on EDN messages and estimate connection capacity to figure out a proper sending rate, thus avoiding congestion aggravation. We implement EDN on the P4-programmable switching ASIC and Linux kernel. Evaluations show that, compared with state-of-the-art loss recovery schemes, EDN reduces the latency by up to 4.1. on average and 3.6. at the 99th-percentile.
What problem does this paper attempt to address?