Abstract:Remote memory techniques for datacenter applications have recently gained a great deal of popularity. Existing remote memory techniques focus on the efficiency of a single application setting only. However, when multiple applications co-run on a remote-memory system, significant interference could occur, resulting in unexpected slowdowns even if the same amounts of physical resources are granted to each application. This slowdown stems from massive sharing in applications' swap data paths. Canvas is a redesigned swap system that fully isolates swap paths for remote-memory applications. Canvas allows each application to possess its dedicated swap partition, swap cache, prefetcher, and RDMA bandwidth. Swap isolation lays a foundation for adaptive optimization techniques based on each application's own access patterns and needs. We develop three such techniques: (1) adaptive swap entry allocation, (2) semantics-aware prefetching, and (3) two-dimensional RDMA scheduling. A thorough evaluation with a set of widely-deployed applications demonstrates that Canvas minimizes performance variation and dramatically reduces performance degradation.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the performance interference problem that occurs when multiple applications co - exist in a remote memory system. Specifically: 1. **Severe lock contention**: Current swapping systems, when multiple applications share swapping resources (such as swap partitions, RDMA, etc.), due to the need to frequently allocate swap entries, lead to severe lock contention, which reduces throughput and hinders the full utilization of RDMA bandwidth. For example, during frequent remote access windows, an application may spend up to 70% of its time obtaining swap entries. 2. **Uncontrolled use of swap resources** (e.g., RDMA bandwidth): Shared RDMA bandwidth is often dominated by applications with many threads that perform frequent remote accesses simultaneously. For example, aggressively (pre - )fetching pages to meet the needs of one application may disproportionately reduce the bandwidth usage of other applications. Moreover, even within an application, resource competition between pre - fetching and demand - swapping can lead to extended fault - handling times or pre - fetching delays, and pages cannot be brought back in a timely manner. 3. **Reduced pre - fetching efficiency**: Current kernel pre - fetchers are built based on low - level access patterns (such as sequential or strided), which are useful for applications that use arrays extensively. However, many cloud applications are written in high - level managed languages (such as Java or Python), and their accesses come from multiple threads or exhibit pointer - chasing behavior rather than sequential or strided patterns. Therefore, when multiple applications co - exist, this single pre - fetching strategy has difficulty working effectively. For example, running Spark and native applications together will reduce the pre - fetching contribution of Leap by 3.19 times. To address these problems, the paper proposes Canvas, a redesigned swapping system that completely isolates the swapping paths by providing each application with a dedicated swap partition, swap cache, pre - fetcher, and RDMA bandwidth. Canvas also develops three adaptive optimization techniques: (1) adaptive swap entry allocation, (2) semantic - aware pre - fetching, and (3) two - dimensional RDMA scheduling. These techniques are optimized based on each application's own access patterns and requirements, thereby minimizing performance variation and significantly reducing performance degradation.

Canvas: Isolated and Adaptive Swapping for Multi-Applications on Remote Memory

Data Based Application Partitioning and Workload Balance in Distributed Environment

Revisiting Swapping in User-space with Lightweight Threading

SEMMA: Secure Efficient Memory Management Approach in Virtual Environment.

Efficient Orchestration of Host and Remote Shared Memory for Memory Intensive Workloads

iSwap: A New Memory Page Swap Mechanism for Reducing Ineffective I/O Operations in Cloud Environments

Flexible Swapping for the Cloud

Efficient Distributed Memory Management with RDMA and Caching

Rcmp: Reconstructing RDMA-Based Memory Disaggregation via CXL

Nomad: Non-Exclusive Memory Tiering via Transactional Page Migration

Flexible and Efficient Memory Swapping Across Mobile Devices With LegoSwap

A Tale of Two Paths: Toward a Hybrid Data Plane for Efficient Far-Memory Applications

Online Application Guidance for Heterogeneous Memory Systems

Container Density Improvements with Dynamic Memory Extension using NAND Flash

A distributed paging RAM grid system for wide-area memory sharing.

MARS: Mobile Application Relaunching Speed-up through Flash-Aware Page Swapping

JArena: Partitioned Shared Memory for NUMA-awareness in Multi-threaded Scientific Applications

Toward Effective and Fair RDMA Resource Sharing.

Megalloc: Fast Distributed Memory Allocator for NVM-Based Cluster

Demystifying the Performance of HPC Scientific Applications on NVM-based Memory Systems

Telepathic Datacenters: Fast RPCs using Shared CXL Memory