Achieving Low Latency for Multipath Transmission in RDMA Based Data Center Network

Zhaoyi Li,Jiawei Huang,Shiqi Wang,Jianxin Wang
DOI: https://doi.org/10.1109/tcc.2024.3365075
IF: 5.697
2024-03-08
IEEE Transactions on Cloud Computing
Abstract:Remote Direct Memory Access (RDMA) achieves ultra-low latency, high throughput and low CPU overhead in data center by implementing the transport logic in hardware network interface card (NIC). However, RDMA faces new challenges in the heterogeneous multipath environment as it is very sensitive to packet reordering. When some packets are blocked in slow paths, the other packets delivered through fast paths have to be buffered at the receiver's NIC, consuming the limited on-chip memory resources. In this paper, we propose a new RDMA-based multipath transmission scheme with advanced fast retransmission called as AFR-MPRDMA. Specifically, once detecting congestion at the slow path, the sender will retransmit the blocked packets on other fast paths to speed up the transmission of blocked packets. Moreover, the receiver dynamically adjusts the buffer size for the out-of-order packets to avoid either unnecessary retransmission or long latency. The results of large-scale tests show that AFR-MPRDMA effectively mitigates packets blocking issue and reduces average flow completion time (AFCT) by up to 61% compared with the state-of-the-art RDMA-based schemes.
computer science, information systems, theory & methods
What problem does this paper attempt to address?
This paper attempts to solve the problems faced by multi - path transmission in RDMA - based data center networks. Specifically, when some data packets are blocked on the slow path, other data packets transmitted through the fast path need to be buffered on the network interface card (NIC) at the receiving end, which will consume the limited on - chip memory resources. This problem leads to the problems of data packet reordering and acknowledgment (ACK) blocking, thereby affecting the performance of data transmission. To solve the above problems, the paper proposes a new RDMA - based multi - path transmission scheme - AFR - MPRDMA (Multi - Path RDMA with Advanced Fast Retransmission). The main features of this scheme are as follows: 1. **Advanced Fast Retransmission Mechanism**: Once congestion on the slow path is detected, the sender will re - transmit the blocked data packets through other fast paths to accelerate the transmission speed of these data packets. 2. **Dynamically Adjust the Receiver Buffer Size**: The receiver dynamically adjusts the size of the bitmap used to buffer out - of - order data packets according to the out - of - order situation of data packets, avoiding unnecessary re - transmissions or long - term delays. Through these improvements, AFR - MPRDMA can effectively alleviate the data packet blocking problem, and significantly reduce the average flow completion time (AFCT) and improve link utilization. Experimental results show that compared with the existing state - of - the - art RDMA schemes, AFR - MPRDMA can reduce the average flow completion time by up to 61%.