MC-RDMA: Improving Replication Performance of RDMA-based Distributed Systems with Reliable Multicast Support
Chengyuan Huang,Yixiao Gao,Wei Chen,Duoxing Li,Yibo Xiao,Ruyi Zhang,Chen Tian,Xiaoliang Wang,Wanchun Dou,Guihai Chen,Yi Wang,Fu Xiao
DOI: https://doi.org/10.1109/icnp59255.2023.10355619
2023-01-01
Abstract:Remote Direct Memory Access has been widely adopted in distributed storage systems. However, it only supports unicast operations, which degrades the performance significantly for data replication because of bandwidth waste and CPU overhead. To address the problem, we propose MC-RDMA, a distributed and reliable multicast RDMA. It is compatible with existing unicast RDMA but supports lazy packet replication with reliable RDMA multicasting. The key idea of MC-RDMA is utilizing in-network programmable switches to build a NIC-transparent reliable multicast protocol for RDMA. MC-RDMA combines the address information of the IP and RoCEv2 into a sender-initialized multicast routing protocol. Besides, it synchronizes the hardware transmission states of multiple receivers by merging ACKs and NAKs. To verify the effectiveness of MC-RDMA, we implement it with Mellanox ConnectX-6 commodity RNICs and Intel Tofino P4 programmable switches. Experimental results show that MC-RDMA can double the sender bandwidth utilization and reduce the CPU overhead significantly compared to unicast-based RDMA replications. Moreover, it reduces the storage request latency by -30% with realistic workloads and decreases the training time by -50% in the distributed training system.