Abstract:Prefetching has emerged as one of the most successful techniques to bridge the gap between modern processors and memory systems. On the other hand, as we move to the deep sub-micron era, power consumption has become one of the most important design constraints besides performance. Intensive research efforts have been done on data prefetching focusing on performance improvement, however, as far as we know, the energy aspects of prefetching have not been fully investigated. This dissertation investigates data prefetching techniques for next-generation processors targeting both energy-effciency and performance speedup. We first evaluate a number of state-of-the-art data prefetching techniques from an energy perspective and identify the main energy-consuming components due to prefetching. We then propose a set of compiler-assisted energy-aware techniques to make hardware-based data prefetching more energy-efficient. From our evaluation on a number of data prefetching techniques, we have found that if leakage is optimized with recently proposed circuit-level techniques, most of the energy overhead of hardware data prefetching comes from prefetch hardware related costs and unnecessary L1 data cache lookups related to prefetches that hit in the L1 cache. This energy overhead on the memory system can be as much as 30%. We propose a set of power-aware prefetch filtering techniques to reduce the energy overhead of hardware data prefetching techniques. Our proposed techniques include three compiler-based filtering approaches that make the prefetch predictor more energy efficient. We also propose a hardware-based filtering technique to further reduce the energy overhead due to unnecessary prefetching in the L1 data cache. The energy-aware filtering techniques combined could reduce up to 40% of the energy overhead introduced due to aggressive prefetching with almost no performance degradation. We also develop a location-set driven data prefetching technique to further reduce the energy consumption of prefetching hardware. In this scheme, we use a power-aware prefetch engine with a novel design of an indexed hardware history table. With the help of compiler-based location-set analysis, we show that the proposed prefetching scheme reduces the energy consumed by the prefetch history table by 7-11X with very small impact on performance. Our experiments show that the proposed techniques could overcome the prefetching-related energy overhead in most applications, improving the energy-delay product by 33% on average. For many applications studied, our work has transformed data prefetching into not only a performance improvement mechanism, but an energy saving technique as well.

Deep learning based data prefetching in CPU-GPU unified virtual memory.

GPUVM: GPU-driven Unified Virtual Memory

Coordinated Page Prefetch and Eviction for Memory Oversubscription Management in GPUs

Fine-Grain Quantitative Analysis of Demand Paging in Unified Virtual Memory

A Virtual Multi-Channel GPU Fair Scheduling Method for Virtual Machines.

In-depth analyses of unified virtual memory system for GPU accelerated computing

Attention, Distillation, and Tabularization: Towards Practical Neural Network-Based Prefetching

An Efficient Hardware Prefetcher Exploiting the Prefetch Potential of Long-Stride Access Pattern on Virtual Address

Differential-Matching Prefetcher for Indirect Memory Access

Hyperion: A Highly Effective Page and PC Based Delta Prefetcher

Gaze into the Pattern: Characterizing Spatial Patterns with Internal Temporal Correlations for Hardware Prefetching

Performance Evaluation of Advanced Features in CUDA Unified Memory

Dynamic Data Prefetchingfor Java Virtual Machine on Many-core Architecture

Revisiting Data Prefetching for Database Systems with Machine Learning Techniques

SGDP: A Stream-Graph Neural Network Based Data Prefetcher

Compiler-assisted Hardware-Based Data Prefetching for Next Generation Processors

Data Cache Prefetching with Perceptron Learning

A readahead prefetcher for GPU file system layer

A novel hardware prefetching scheme exploiting 2-D spatial locality in multimedia applications

Exploring Data Prefetching Mechanisms for Last Level Cache in Chip Multi-Processors

Oversubscribing GPU Unified Virtual Memory: Implications and Suggestions