NP-RDMA: Using Commodity RDMA without Pinning Memory

Huijun Shen,Guo Chen,Bojie Li,Xingtong Lin,Xingyu Zhang,Xizheng Wang,Amit Geron,Shamir Rabinovitch,Haifeng Lin,Han Ruan,Lijun Li,Jingbin Zhou,Kun Tan
2023-10-17
Abstract:Remote Direct Memory Access (RDMA) has been haunted by the need of pinning down memory regions. Pinning limits the memory utilization because it impedes on-demand paging and swapping. It also increases the initialization latency of large memory applications from seconds to minutes. To remove memory pining, existing approaches often require special hardware which supports page fault, and still have inferior performance. We propose NP-RDMA, which removes memory pinning during memory registration and enables dynamic page fault handling with commodity RDMA NICs. NP-RDMA does not require NICs to support page fault. Instead, by monitoring local memory paging and swapping with MMU-notifier, combining with IOMMU/SMMU-based address mapping, NP-RDMA efficiently detects and handles page fault in the software with near-zero additional latency to non-page-fault RDMA verbs. We implement an LD_PRELOAD library (with a modified kernel module), which is fully compatible with existing RDMA applications. Experiments show that NP-RDMA adds only 0.1{\sim}2 {\mu}s latency under non-page-fault scenarios. Moreover, NP-RDMA adds only 3.5{\sim}5.7 {\mu}s and 60 {\mu}s under minor or major page faults, respectively, which is 500x faster than ODP which uses advanced NICs that support page fault. With non-pinned memory, Spark initialization is 20x faster and the physical memory usage reduces by 86% with only 5.4% slowdown. Enterprise storage can expand to 5x capacity with SSDs while the average latency is only 10% higher. To the best of our knowledge, NP-RDMA is the first efficient and application-transparent software approach to remove memory pinning using commodity RDMA NICs.
Networking and Internet Architecture
What problem does this paper attempt to address?
The paper aims to address the memory pinning issue in Remote Direct Memory Access (RDMA) technology. Memory pinning limits memory utilization and increases the initialization delay of large memory applications. To overcome these issues, the research team proposed the NP-RDMA method. ### Main Issues - **Memory Pinning Issue**: RDMA applications need to "pin" memory regions before accessing local or remote memory, which means physical memory pages are fixedly allocated to virtual addresses, and these memories cannot be paged or swapped out on demand. This leads to several major issues: - Impact on memory utilization: Even if the pages are never accessed, they still occupy physical memory space, making on-demand paging impossible. - Increased programming complexity: Applications cannot utilize the virtual memory abstraction mechanism provided by the operating system. - For applications with large memory usage, the time to pin memory during initialization increases from seconds to minutes. ### Solution - **NP-RDMA**: This method removes the need for memory pinning through software means, without requiring special hardware support. Specifically, NP-RDMA implements dynamic page fault handling, efficiently detecting and handling page faults while maintaining compatibility with existing RDMA applications. ### Technical Details - **Signature Pages and Blackhole Pages**: When pages are swapped out, invalid virtual addresses are mapped to special signature pages or blackhole pages using IOMMU/SMMU mapping, containing specific flag values. This method avoids direct memory access (DMA) failures of the RDMA Network Interface Card (NIC). - **Page Version Control**: For large-scale data transfers, a page version control mechanism is used to reduce overhead. Each virtual page is associated with a remotely accessible version number, which increments with each swap in or out to track the page state. - **Two-Stage Processing**: NP-RDMA adopts an optimistic one-sided approach to attempt RDMA read/write operations. If a potential page fault is detected, it switches to a two-sided operation to resolve the issue. - **One-Sided Optimistic Approach**: Identifies fault pages by checking data content and re-executes read/write operations using two-sided operations. - **Two-Sided Approach**: Converts any read/write operation into a reverse write-read operation, temporarily pinning the local buffer and sending data back and forth. ### Experimental Results - In non-page fault scenarios, NP-RDMA only added 0.1 to 2 microseconds of latency. - In light page fault scenarios, it added 3.5 to 5.7 microseconds of latency. - In severe page fault scenarios, it added about 60 microseconds of latency. - Experiments also demonstrated performance improvements in real-world applications, such as Spark initialization time and TPC-DS benchmark tests. ### Summary NP-RDMA is an efficient software solution that removes the need for memory pinning in RDMA without relying on special hardware. This method addresses the issues caused by memory pinning through a series of innovative techniques, significantly improving memory utilization and application performance.