Abstract:Disaggregated memory is a promising approach that addresses the limitations of traditional memory architectures by enabling memory to be decoupled from compute nodes and shared across a data center. Cloud platforms have deployed such systems to improve overall system memory utilization, but performance can vary across workloads. High-performance computing (HPC) is crucial in scientific and engineering applications, where HPC machines also face the issue of underutilized memory. As a result, improving system memory utilization while understanding workload performance is essential for HPC operators. Therefore, learning the potential of a disaggregated memory system before deployment is a critical step. This paper proposes a methodology for exploring the design space of a disaggregated memory system. It incorporates key metrics that affect performance on disaggregated memory systems: memory capacity, local and remote memory access ratio, injection bandwidth, and bisection bandwidth, providing an intuitive approach to guide machine configurations based on technology trends and workload characteristics. We apply our methodology to analyze thirteen diverse workloads, including AI training, data analysis, genomics, protein, fusion, atomic nuclei, and traditional HPC bookends. Our methodology demonstrates the ability to comprehend the potential and pitfalls of a disaggregated memory system and provides motivation for machine configurations. Our results show that eleven of our thirteen applications can leverage injection bandwidth disaggregated memory without affecting performance, while one pays a rack bisection bandwidth penalty and two pay the system-wide bisection bandwidth penalty. In addition, we also show that intra-rack memory disaggregation would meet the application's memory requirement and provide enough remote memory bandwidth.

EMF: Disaggregated GPUs in Datacenters for Efficiency, Modularity and Flexibility

DxPU: Large Scale Disaggregated GPU Pools in the Datacenter

DaeMon: Architectural Support for Efficient Data Movement in Disaggregated Systems

PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers

Efficient Resource Sharing Through GPU Virtualization on Accelerated High Performance Computing Systems

Efficient Intra-Rack Resource Disaggregation for HPC Using Co-Packaged DWDM Photonics

FROM CPU TO GPU: GPU-BASED ELECTROMAGNETIC COMPUTING (GPUECO)

Design and Evaluation of a Rack-Scale Disaggregated Memory Architecture For Data Centers

Evaluating the Potential of Disaggregated Memory Systems for HPC applications

AEML: An Acceleration Engine for Multi-GPU Load-balancing in Distributed Heterogeneous Environment

GPU Domain Specialization via Composable On-Package Architecture

Energy-Efficient Resource Management for Federated Edge Learning With CPU-GPU Heterogeneous Computing

FLARE: Flexibly Sharing Commodity GPUs to Enforce QoS and Improve Utilization

Multi-Tenant Virtual GPUs for Optimising Performance of a Financial Risk Application

ParvaGPU: Efficient Spatial GPU Sharing for Large-Scale DNN Inference in Cloud Environments

A Quantitative Approach for Adopting Disaggregated Memory in HPC Systems

Disaggregated Memory at the Edge

Optimizing power efficiency for 3D stacked GPU-in-memory architecture

GME: GPU-based Microarchitectural Extensions to Accelerate Homomorphic Encryption

Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect