Abstract:Unified Virtual Memory (UVM) relieves the developers from the onus of maintaining complex data structures and explicit data migration by enabling on-demand data movement between CPU memory and GPU memory. However, on-demand paging soon becomes a performance bottleneck of UVM due to the high latency caused by page table walks and data migration over interconnect. Prefetching is considered a promising solution to this problem given its ability to leverage the locality of program memory access patterns. However, existing locality-based prefetching schemes can not handle all the situations. An ideal prefetcher should not only look at narrow regions of the requested address space but also capture global context to deliver a good prediction of the memory access pattern. This paper proposes a novel approach for page prefetching for UVM through deep learning. We first show that a powerful Transformer learning model can provide high accuracy for UVM page prefetching. We then perform analysis to interpret this Transformer model and derive several insights that allow us to design a simpler model to match the unconstrained model’s accuracy with orders of magnitude lower cost. We evaluate this simplified model on a set of 11 memory-intensive benchmarks from popular benchmark suites. Our solution outperforms the state-of-the-art UVM framework, improvingtheperformanceby10.89%,improvingthedevicememorypagehitrateby16.98%(89.02%vs.76.10%forpriorart),andreducingtheCPU-GPUinterconnecttrafficby11.05%.Accordingtoourproposedunifiedmetric,whichcombinestheaccuracy,coverage,andpagehitrate,oursolutionisapproachingtheidealprefetchingschememorethanthestate-of-the-artdesign(0.90vs.0.85,withtheperfectprefetcherof1.0). page migration, zero-copy, and tree-based page prefetching. We compare IPC, page hit rate, CPU-GPU interconnect usage, and unity of benchmark

ABMLP: Attention-Based Multi-Layer Perceptron Prefetcher

AMPP: an Adaptive Multilayer Perceptron Prefetcher for Irregular Data Prefetching

Attention, Distillation, and Tabularization: Towards Practical Neural Network-Based Prefetching

Data Cache Prefetching with Perceptron Learning

Phases, Modalities, Temporal and Spatial Locality: Domain Specific ML Prefetcher for Accelerating Graph Analytics

A Two Level Neural Approach Combining Off-Chip Prediction with Adaptive Prefetch Filtering

A Prefetch-Adaptive Intelligent Cache Replacement Policy Based on Machine Learning

Algorithm/Architecture of NN-Based Configuration Prefetching

Revisiting Data Prefetching for Database Systems with Machine Learning Techniques

Learning Memory Access Patterns

A New Technology of Multi-core Prefetching

Aap And Aapm: Improved Prefetching Structures Of The L2 Cache

A New Prefetching Strategy Based on Access Density in Linux

Characterizing Machine Learning-Based Runtime Prefetcher Selection

G&L: an Attention-based Model for Improving Prefetching in Solid-state Drives.

A Fairness-Aware Prefetching Mechanism Based on Reinforcement Learning for Multi-Core Systems

An Efficient Data Prefetch Strategy for Deep Learning Based on Non-volatile Memory

Deep learning based data prefetching in CPU-GPU unified virtual memory.

A Prefetching Strategy Based on LMS Rule

Prefetching Policy Using Miss Queue Information

Bayesian Theory Based Adaptive Proximity Data Accessing For Cmp Caches