Switch-Assistant Loss Recovery for RDMA Transport Control
Qingkai Meng,Yiran Zhang,Shan Zhang,Zhiyuan Wang,Tong Zhang,Hongbin Luo,Fengyuan Ren
DOI: https://doi.org/10.1109/tnet.2023.3336661
2024-01-01
Abstract:RoCEv2 (RDMA over Converged Ethernet version 2) is the canonical method for deploying RDMA in Ethernet-based datacenters. Traditionally, RoCEv2 runs over the lossless network which is in turn achieved by enabling Priority Flow Control (PFC) within the network. However, as the scale of the datacenter increases, PFC’s side effects, such as head-of-line blocking, congestion spreading, and pause frame storm, are amplified. Datacenter operators can no longer tolerate these problems. In hence, they are seeking PFC alternatives for RDMA networks. Rather than aiming at the lossless RDMA network, we instead handle packet loss effectively to support RDMA over Ethernet. In this paper, we propose Switch-assistant Loss Recovery (SLR), a switch building block to enhance RoCEv2’s loss recovery. Specifically, SLR-enabled switches send loss notifications to request fast retransmissions. To cooperate with go-back-N retransmission, SLR generates loss notifications only when expected packets ( i.e. , in-order packets expected by receivers) are dropped and then filters out unexpected packets, which can avoid timeouts and prevent exacerbating congestion. Further, we adapt SLR to multi-bottleneck scenarios by inferring expected packets among multiple switch views. We implement SLR prototypes on commodity programmable switches. Evaluations show that SLR reduces the 99.9th-percentile FCT slowdown by up to 21.6 $\times$ compared to PFC and other state-of-the-arts.