RB2: Narrow the Gap Between RDMA Abstraction and Performance Via a Middle Layer

Haifeng Sun,Yixuan Tan,Yongtong Wu,Jiaqi Zhu,Qun Huang,Xin Yao,Gong Zhang
DOI: https://doi.org/10.1109/infocom52122.2024.10621169
2024-01-01
Abstract:Although the native RDMA interface allows for high throughput and low latency, its low-level abstraction raises significant programming challenges. Consequently, numerous systems encapsulate the RDMA interface into more user-friendly high-level abstractions such as Socket, MPI, and RPC. However, this ease of development often incurs considerable performance degradation. To address this trade-off, this paper introduces RB2, a high-performance RDMA-based Distributed Ring Buffer (DRB). RB2 serves as a middle layer that effectively conceals the low-level details of the RDMA interface while also facilitating extension to other high-level abstractions. Nonetheless, it is non-trivial for DRBs to preserve the RDMA performance. We optimize the performance of RB2 in three aspects. First, we perform micro-benchmarks to identify the pointer synchronization methods that are seemingly counter-intuitive but offer optimal performance improvements. Second, we propose an adaptive batching mechanism to alleviate the limitations of conventional fixed batching. Finally, we build an efficient memory subsystem using various optimization techniques. RB2 outperforms SOTA designs by achieving 2.5x to 7.5x throughput while maintaining comparable tail latency for small messages.
What problem does this paper attempt to address?