Abstract:Utilizing analytical models to evaluate proposals or provide guidance in high-level architecture decisions is been becoming more and more attractive. A certain number of methods have emerged regarding cache behaviors and quantified insights in the last decade, such as the stack distance theory and the memory level parallelism (MLP) estimations. However, prior research normally oversimplified the factors that need to be considered in out-of-order processors, such as the effects triggered by reordered memory instructions, and multiple dependences among memory instructions, along with the merged accesses in the same MSHR entry. These ignored influences actually result in low and unstable precisions of recent analytical models. By quantifying the aforementioned effects, this article proposes a cache performance evaluation framework equipped with three analytical models, which can more accurately predict cache misses, MLPs, and the average cache miss service time, respectively. Similar to prior studies, these analytical models are all fed with profiled software characteristics in which case the architecture evaluation process can be accelerated significantly when compared with cycle-accurate simulations. We evaluate the accuracy of proposed models compared with gem5 cycle-accurate simulations with 16 benchmarks chosen from Mobybench Suite 2.0, Mibench 1.0, and Mediabench II. The average root mean square errors for predicting cache misses, MLPs, and the average cache miss service time are around 4%, 5%, and 8%, respectively. Meanwhile, the average error of predicting the stall time due to cache misses by our framework is as low as 8%. The whole cache performance estimation can be sped by about 15 times versus gem5 cycle-accurate simulations and 4 times when compared with recent studies. Furthermore, we have shown and studied the insights between different performance metrics and the reorder buffer sizes by using our models. As an application case of the framework, we also demonstrate how to use our framework combined with McPAT to find out Pareto optimal configurations for cache design space explorations.

Reconfigurable cache for real-time MPSoCs: Scheduling and implementation

Automatic Cache Partitioning and Time-Triggered Scheduling for Real-Time MPSoCs

Abstract: Shared L2 Cache Management in Multicore Real-Time System

The Design of Reconfigurable Cache Scheme in Multi-core Processor

Cache-aware Scheduling and Analysis for Multicores

Cache-conscious off-line real-time scheduling for multi-core platforms: algorithms and implementation

Online Cache Modeling for Commodity Multicore Processors

Reconfigurable Mpb Combined with Cache Coherence Protocol in Many-Core

Dynamically Reconfigurable Cache for Low-Power Embedded System

Online Scheduling for Multi-Core Shared Reconfigurable Fabric

Co-Optimizing Cache Partitioning and Multi-Core Task Scheduling: Exploit Cache Sensitivity or Not?

A Software Method Of Reconfigurable Technology On Soc Cache

Fast On-Line Real-Time Scheduling Algorithm for Reconfigurable Computing.

An Adjustable Fine-Grain Cache Assignment Scheduling Algorithm Based on Multi-core Architecture

An Analytical Cache Performance Evaluation Framework for Embedded Out-of-Order Processors Using Software Characteristics

Dynamical reconfigurable cache architecture with low-power

A Shared Cache-Aware Hybrid Real-Time Scheduling on Multicore Platform with Hierarchical Cache

MiCache: an MSHR-inclusive Non-blocking Cache Design for FPGAs

Improving the Performance of Adaptive Cache in Reconfigurable VLIW Processor.

A Comprehensive Memory Management Framework for CPU-FPGA Heterogenous SoCs

HSCS: a Hybrid Shared Cache Scheduling Scheme for Multiprogrammed Workloads