RDMAvisor: Toward Deploying Scalable and Simple RDMA as a Service in Datacenters

Zhi Wang,Xiaoliang Wang,Zhuzhong Qian,Baoliu Ye,Sanglu Lu
DOI: https://doi.org/10.48550/arXiv.1802.01870
2018-02-06
Abstract:RDMA is increasingly adopted by cloud computing platforms to provide low CPU overhead, low latency, high throughput network services. On the other hand, however, it is still challenging for developers to realize fast deployment of RDMA-aware applications in the datacenter, since the performance is highly related to many lowlevel details of RDMA operations. To address this problem, we present a simple and scalable RDMA as Service (RaaS) to mitigate the impact of RDMA operational details. RaaS provides careful message buffer management to improve CPU/memory utilization and improve the scalability of RDMA operations. These optimized designs lead to simple and flexible programming model for common and knowledgeable users. We have implemented a prototype of RaaS, named RDMAvisor, and evaluated its performance on a cluster with a large number of connections. Our experiment results demonstrate that RDMAvisor achieves high throughput for thousand of connections and maintains low CPU and memory overhead through adaptive RDMA transport selection.
Distributed, Parallel, and Cluster Computing,Networking and Internet Architecture,Performance
What problem does this paper attempt to address?
This paper attempts to solve several key challenges encountered when rapidly deploying Remote Direct Memory Access (RDMA) services in data centers. Specifically, these challenges include: 1. **Operational Availability**: The current RDMA network function design is tightly coupled with the target system, which makes the code difficult to be reused in other applications. Moreover, since the RDMA system design highly depends on low - level details, such as Network Interface Card (NIC) architecture and operational details, improper configuration may lead to performance even lower than that of the traditional TCP/IP stack. 2. **Scalability**: Modern data centers need to handle a large number of connections and are required to be able to scale gradually. However, due to the limited cache space, the current ConnectX - 3 NIC can only support approximately 400 Queue Pairs (QPs). Therefore, the connection - transport method that provides exclusive access to QPs does not have good scalability. Although sharing QPs is a common solution, existing methods may reduce CPU efficiency because threads compete for locks. 3. **Resource Utilization**: In a shared data - center environment, one - sided receive operations need to issue receive Work Requests (WRs) in advance, which may lead to inefficient use of memory or network. In addition, data receivers may not be aware that the Receive Queue (RQ) is starving, causing the corresponding polling threads to waste CPU resources. To address these challenges, the paper proposes a simple and scalable RDMA - as - a - Service (RaaS) framework, named RDMAvisor. RDMAvisor solves the above problems in the following ways: - **Abstracting Flexible RDMA Functions**: Abstract flexible RDMA functions through a Socket - like network interface, hiding complex low - level details, enabling ordinary users to use it easily, and providing the flexibility of custom settings for users with special requirements. - **Efficient Lock - Free Design**: Introduce an efficient lock - free design to support thousands of connections within a host, thereby improving scalability and reducing CPU overhead. - **Resource Sharing**: Achieve high memory utilization and low CPU overhead by sharing resources (such as Shared Receive Queues SRQs) among multiple applications. Experimental results show that RDMAvisor can achieve high throughput when handling a large number of connections and maintain low CPU and memory overhead through adaptive selection of RDMA transport methods.