Abstract:Remote Direct Memory Access (RDMA) has been haunted by the need of pinning down memory regions. Pinning limits the memory utilization because it impedes on-demand paging and swapping. It also increases the initialization latency of large memory applications from seconds to minutes. To remove memory pining, existing approaches often require special hardware which supports page fault, and still have inferior performance. We propose NP-RDMA, which removes memory pinning during memory registration and enables dynamic page fault handling with commodity RDMA NICs. NP-RDMA does not require NICs to support page fault. Instead, by monitoring local memory paging and swapping with MMU-notifier, combining with IOMMU/SMMU-based address mapping, NP-RDMA efficiently detects and handles page fault in the software with near-zero additional latency to non-page-fault RDMA verbs. We implement an LD_PRELOAD library (with a modified kernel module), which is fully compatible with existing RDMA applications. Experiments show that NP-RDMA adds only 0.1{\sim}2 {\mu}s latency under non-page-fault scenarios. Moreover, NP-RDMA adds only 3.5{\sim}5.7 {\mu}s and 60 {\mu}s under minor or major page faults, respectively, which is 500x faster than ODP which uses advanced NICs that support page fault. With non-pinned memory, Spark initialization is 20x faster and the physical memory usage reduces by 86% with only 5.4% slowdown. Enterprise storage can expand to 5x capacity with SSDs while the average latency is only 10% higher. To the best of our knowledge, NP-RDMA is the first efficient and application-transparent software approach to remove memory pinning using commodity RDMA NICs.

RB2: Narrow the Gap Between RDMA Abstraction and Performance Via a Middle Layer

Maximizing the Benefit of RDMA at End Hosts

POSTER: CAVER: Enhancing RDMA Load Balancing by Hunting Less-Congested Paths

RFP: When RPC is Faster than Server-Bypass with RDMA.

RF-RPC: Remote Fetching RPC Paradigm for RDMA-Enabled Network

RDMA is Turing complete, we just did not know it yet!

Achieving Low Latency for Multipath Transmission in RDMA Based Data Center Network

RDMA Performance Isolation With Justitia

Scalable RDMA RPC on Reliable Connection with Efficient Resource Sharing

MC-RDMA: Improving Replication Performance of RDMA-based Distributed Systems with Reliable Multicast Support

RDMAvisor: Toward Deploying Scalable and Simple RDMA as a Service in Datacenters

Srdma: A General and Low-Overhead Scheduler for RDMA.

A Survey of Storage Systems in the RDMA Era

Efficient RDMA Communication Protocols

L2BM: Switch Buffer Management for Hybrid Traffic in Data Center Networks

A Comprehensive Evaluation of RDMA-enabled Concurrency Control Protocols.

RDMA Load Balancing via Data Partition

Toward Effective and Fair RDMA Resource Sharing.

Towards Zero Copy Dataflows using RDMA

A case for RDMA in clouds: turning supercomputer networking into commodity

NP-RDMA: Using Commodity RDMA without Pinning Memory