Abstract:Software-managed heterogeneous memory (HM) provides a promising solution to increase memory capacity and cost efficiency. However, to release the performance potential of HM, we face a problem of data management. Given an application with various execution phases and each with possibly distinct working sets, we must move data between memory components of HM to optimize performance. The deep neural network (DNN), as a common workload on data centers, imposes great challenges on data management on HM. This workload often employs a task dataflow execution model, and is featured with a large amount of small data objects and fine-grained operations (tasks). This execution model imposes challenges on memory profiling and efficient data migration. We present Sentinel, a runtime system that automatically optimizes data migration (i.e., data management) on HM to achieve performance similar to that on the fast memory-only system with a much smaller capacity of fast memory. To achieve this,Sentinel exploits domain knowledge about deep learning to adopt a custom approach for data management. Sentinel leverages workload repeatability to break the dilemma between profiling accuracy and overhead; It enables profiling and data migration at the granularity of data objects (not pages), by controlling memory allocation. This method bridges the semantic gap between operating system and applications. By associating data objects with the DNN topology, Sentinel avoids unnecessary data movement and proactively triggers data movement. Using only 20% of peak memory consumption of DNN models as fast memory size, Sentinel achieves the same or comparable performance (at most 8% performance difference) to that of the fast memory-only system on common DNN models; Sentinel also consistently outperforms a state-of-the-art solution by 18%.

Graph Neural Networks Based Memory Inefficiency Detection Using Selective Sampling

GRAPHSPY: Fused Program Semantic-Level Embedding via Graph Neural Networks for Dead Store Detection

GRAPHSPY - Fused Program Semantic Embedding Through Graph Neural Networks for Memory Efficiency.

Memory-Efficient Performance Monitoring on Programmable Switches with Lean Algorithms

Online Memory Leak Detection in the Cloud-based Infrastructures

G-NMP: Accelerating Graph Neural Networks with DIMM-based Near-Memory Processing

GATe: Streamlining Memory Access and Communication to Accelerate Graph Attention Network With Near-Memory Processing

A Memory Hierarchical Layer Assigning and Prefetching Technique to Overcome the Memory Performance/Energy Bottleneck

Spindle: Informed Memory Access Monitoring

Efficient Memory Management for Deep Neural Net Inference

Disaggregated Memory with SmartNIC Offloading: a Case Study on Graph Processing

WELDER: Scheduling Deep Learning Memory Access Via Tile-graph

Who Ate My Memory? Towards Attribution in Memory Management

Examem: Low-Overhead Memory Instrumentation for Intelligent Memory Systems

Sentinel: Runtime Data Management on Heterogeneous Main MemorySystems for Deep Learning

Pinpointing the Memory Behaviors of DNN Training

MAGIS: Memory Optimization Via Coordinated Graph Transformation and Scheduling for DNN

GNNear: Accelerating Full-Batch Training of Graph Neural Networks with Near-Memory Processing

Memory-Efficient Community Detection on Large Graphs Using Weighted Sketches

Puppeteer: A Random Forest-based Manager for Hardware Prefetchers across the Memory Hierarchy