Flor: An Open High Performance RDMA Framework Over Heterogeneous RNICs.
Qiang Li,Yixiao Gao,Xiaoliang Wang,Haonan Qiu,Yanfang Le,Derui Liu,Qiao Xiang,Fei Feng,Peng Zhang,Bo Li,Jianbo Dong,Lingbo Tang,Hongqiang Harry Liu,Shaozong Liu,Weijie Li,Rui Miao,Yaohui Wu,Zhiwu Wu,Chao Han,Lei Yan,Zheng Cao,Zhongjie Wu,Chen Tian,Guihai Chen,Dennis Cai,Jinbo Wu,Jiaji Zhu,Jiesheng Wu,Jiwu Shu
2023-01-01
Abstract:Datacenter applications have been increasingly applying RDMA for ultra-low latency and low CPU overhead. However, RDMA-capable NICs (RNICs) of different vendors or different generations of the same vendor do not cooperate well, which could cause bandwidth imbalance in the production network and introduce new root causes of the PFC storms. Our key observation is that although the data path functions of heterogenous RNICs follow the same RoCEv2 specifications, their control path functions are vendor and version specific. To this end, we propose Flor, an open framework that provides a unified hardware data plane atop heterogeneous RNICs and a flexible software control plane running over host CPUs or NPU of RNICs and DPUs. The hardware plane requires no changes to current specifications. The software plane on-loads congestion control and reliability management in the large-scale lossy Ethernet with no PFC dependency. We implemented and evaluated Flor in both testbed and production clusters over Intel E180, Mellanox CX-4 and CX-5 and Broadcom RNICs. Experiments show that Flor achieves comparable performance to vanilla RDMA in many scenarios, including 1/4096 packet loss, 6000:1 incast, and large-scale cross-pod communication. Flor mitigates the performance gap of CX-4 and CX-5 RNICs from 24.3% to 1.3% when they are deployed together.