Abstract:Energy efficiency is becoming a major constraint in processor designs. Every component of the processor should be reconsidered to reduce wasted energy and area. Prefetching is an important technique for tolerating memory latency. Prefetcher designs have important impact on the energy efficiency of the memory hierarchy. Stride prefetchers require little storage, but cannot handle irregular access patterns. Delta correlation (DC) prefetchers can handle complicated access patterns, but waste storage because of storing multiple miss addresses for a stride pattern. Moreover, DC prefetchers waste the bandwidth and energy of the memory hierarchy because they cannot identify whether an address has been prefetched and generate a large number of redundant prefetches. In this paper, we propose a storage and energy efficient data prefetcher called stride/DC (S/DC) to combine the advantages of stride and DC prefetchers. S/DC uses a pattern prediction table (PPT) which stores two recent miss addresses in each entry to capture stride patterns. PPT avoids recording multiple miss addresses for a stride pattern, and thus improves the storage efficiency. When handling stride patterns, each PPT entry maintains a counter for obtaining the last prefetched address to avoid generating redundant prefetches. When handling other patterns, S/DC compares the new predicted address with earlier generated addresses in the prefetch queue and filters the redundant ones. In addition, to expand the filtering scope, S/DC uses a prefetch filter to store addresses evicted from the prefetch queue. In this way, S/DC reduces the bandwidth requirements and energy consumption of prefetching. Experimental results demonstrate that S/DC achieves comparable performance with only 24% of the storage and reduces 11.46% of the L2 cache energy, as compared to the CZone/DC prefetcher.

Hyperion: A Highly Effective Page and PC Based Delta Prefetcher

BTIP: Branch Triggered Instruction Prefetcher Ensuring Timeliness

Helper Without Threads: Customized Prefetching for Delinquent Irregular Loads

Tyche: An Efficient and General Prefetcher for Indirect Memory Accesses

Triangel: A High-Performance, Accurate, Timely On-Chip Temporal Prefetcher

Prefetching Techniques for STT-RAM Based Last-Level Cache in CMP Systems

Puppeteer: A Random Forest-based Manager for Hardware Prefetchers across the Memory Hierarchy

Software Prefetching for Indirect Memory Accesses

Applying Deep Learning to the Cache Replacement Problem

Data Cache Prefetching with Perceptron Learning

An Approach to Data Prefetching Using 2-Dimensional Selection Criteria

Reducing Load Latency with Cache Level Prediction

S/DC: A Storage and Energy Efficient Data Prefetcher

Attention, Distillation, and Tabularization: Towards Practical Neural Network-Based Prefetching

Gaze into the Pattern: Characterizing Spatial Patterns with Internal Temporal Correlations for Hardware Prefetching

Practical Temporal Prefetching With Compressed On-Chip Metadata

Exploring dynamic program locality with Lookahead Cache for filling memory gap

A Two Level Neural Approach Combining Off-Chip Prediction with Adaptive Prefetch Filtering

A hybrid cache architecture with 2D-based prefetching scheme for image and video processing

AMC: Access to Miss Correlation Prefetcher for Evolving Graph Analytics

PARE: a power-aware hardware data prefetching engine.