Lightning: A Practical Building Block for RDMA Transport Control

Qingkai Meng,Fengyuan Ren
DOI: https://doi.org/10.1109/iwqos52092.2021.9521326
2021-01-01
Abstract:RoCEv2 (RDMA over Converged Ethernet version 2) is the canonical method for deploying RDMA in Ethernet-based datacenters. Traditionally, RoCEv2 runs over the lossless network which is in turn achieved by enabling Priority Flow Control (PFC) within the network. However, with the scale of data center increases, PFC’s side effects, such as head-of-line blocking, congestion spreading, and PFC storms, are amplified. Datacenter operators can no longer tolerate these problems. They are seeking PFC alternatives for RDMA networks. Rather than aim at the lossless RDMA network, we instead handle packet loss effectively to support RDMA over Ethernet.In this paper, we propose Lightning, a switch building block to enhance RoCE’s simple loss recovery. Lightning enhances the switches to send loss notifications directly to the sources with high priority, thus informing sources as quickly as possible. Then, sources can retransmit packets sooner. By addressing challenges such as that shared buffer status is not available at ingress in modern switches, Lightning generates loss notification only when the expected packet is dropped and filters other unexpected packets at ingress, so as to avoid timeouts and prevent unnecessary congestion from unexpected packets. We implement Lightning on commodity programmable switches. In our evaluation, Lightning achieves up to 16.08× reduction of 99.9th percentile flow completion time compared to PFC, IRN and other alternatives.
What problem does this paper attempt to address?