Abstract:Prefetching has emerged as one of the most successful techniques to bridge the gap between modern processors and memory systems. On the other hand, as we move to the deep sub-micron era, power consumption has become one of the most important design constraints besides performance. Intensive research efforts have been done on data prefetching focusing on performance improvement, however, as far as we know, the energy aspects of prefetching have not been fully investigated. This dissertation investigates data prefetching techniques for next-generation processors targeting both energy-effciency and performance speedup. We first evaluate a number of state-of-the-art data prefetching techniques from an energy perspective and identify the main energy-consuming components due to prefetching. We then propose a set of compiler-assisted energy-aware techniques to make hardware-based data prefetching more energy-efficient. From our evaluation on a number of data prefetching techniques, we have found that if leakage is optimized with recently proposed circuit-level techniques, most of the energy overhead of hardware data prefetching comes from prefetch hardware related costs and unnecessary L1 data cache lookups related to prefetches that hit in the L1 cache. This energy overhead on the memory system can be as much as 30%. We propose a set of power-aware prefetch filtering techniques to reduce the energy overhead of hardware data prefetching techniques. Our proposed techniques include three compiler-based filtering approaches that make the prefetch predictor more energy efficient. We also propose a hardware-based filtering technique to further reduce the energy overhead due to unnecessary prefetching in the L1 data cache. The energy-aware filtering techniques combined could reduce up to 40% of the energy overhead introduced due to aggressive prefetching with almost no performance degradation. We also develop a location-set driven data prefetching technique to further reduce the energy consumption of prefetching hardware. In this scheme, we use a power-aware prefetch engine with a novel design of an indexed hardware history table. With the help of compiler-based location-set analysis, we show that the proposed prefetching scheme reduces the energy consumed by the prefetch history table by 7-11X with very small impact on performance. Our experiments show that the proposed techniques could overcome the prefetching-related energy overhead in most applications, improving the energy-delay product by 33% on average. For many applications studied, our work has transformed data prefetching into not only a performance improvement mechanism, but an energy saving technique as well.

A Comprehensive Study of Executing Ahead Mechanism for In-Order Microprocessors

An Energy-Efficient Executing Ahead Mechanism for Improving the Performance of Single-Issue In-Order Microprocessors

Pre-Execution Directed Prefetching for In-Order Processors

A Pre-Execution Mechanism Based on Value Prediction and Instruction Reuse for In-Order Processors

Lookahead Cache with Instruction Processing Unit for Filling Memory Gap

Design and Implementation of A High-Performance Microprocessor Cache Compression Algorithm

Global Data Access Optimization Via Load/Store Instruction Extension

An Energy-Efficient Combining Way Selective Technique for the Instruction Cache in Superscalar Microprocessors

Efficient Instruction Scheduling Using Real-time Load Delay Tracking

Cache Miss Reduction Through Hardware-Assisted Loop Optimization

Performance Evaluation and Optimization of Cache Architecture for Simultaneous Multithreading Processor

Orinoco: Ordered Issue and Unordered Commit with Non-Collapsible Queues.

Exploring Data Prefetching Mechanisms for Last Level Cache in Chip Multi-Processors

Exploring dynamic program locality with Lookahead Cache for filling memory gap

An Energy-Efficient Instruction Scheduler Design with Two-Level Shelving and Adaptive Banking

Active Store Window: Enabling Far Store-Load Forwarding with Scalability and Complexity-Efficiency.

Compiler-assisted Hardware-Based Data Prefetching for Next Generation Processors

Optimization of software data prefetching in the IA-64 architecture

Prefetching Techniques for STT-RAM Based Last-Level Cache in CMP Systems

C-Pack: A High-Performance Microprocessor Cache Compression Algorithm

Focusing processor policies via critical-path prediction