Abstract:The large working sets of commercial and scientific workloads favor a shared L2 cache design that maximizes the aggregate cache capacity and minimizes off-chip memory requests in chip multiprocessors (CMP). There are two important hurdles that restrict the scalability of these chip multiprocessors: the on-chip memory cost of directory and the long L1 miss latencies. This work presents network caching architecture aimed at facing these two important problems. Network caching takes advantage of on-chip networks to manage shared data blocks and directory information in chip multiprocessors. The network caching architecture removes the directory structure from shared L2 caches and stores directory information for the blocks recently cached by L1 caches in the network interface components decreasing on-chip directory memory overhead and improves the scalability. The saved memory space is used as shared data caches or victim caches which are embedded into the network interface components to reduce L1 miss latencies further. This paper develops three network caching designs to reduce L1 miss latencies. The proposed architecture is evaluated based on simulations of a 16-core tiled CMP. First, we demonstrate that network caching architecture provides good scalability. Second, network caching architecture also provides robust performance. Third, different network caching designs have distinct impacts on performance of CMP. Against over the traditional shared L2 cache design, network victim cache (NVC) design improves performance by 23% on average, and up to 34% at best. Network shared cache (NSC) design provides performance improvement by 6% on average, and up to 16% at best. Network directory cache (NDC) design achieves performance improvement by 4% on average, and up to 11% at best.

High Performance Cache Block Replication Using Re-Reference Probability in CMPs

Dynamic Reusability-Based Replication with Network Address Mapping in CMPs.

ARP: an Adaptive Replication Policy in Tiled Chip Multiprocessor

Scalable Proximity-Aware Cache Replication in Chip Multiprocessors

Proximity-Aware Cache Replication

An Exploration of Page Replication for NoC-Based On-Chip Distributed Memory Systems

Cache Promotion Policy Using Re-reference Interval Prediction

Cooperatively Managing Dynamic Writeback and Insertion Policies in a Last-Level DRAM Cache.

L1 Collective Cache: Managing Shared Data for Chip Multiprocessors

Cache Sharing Management for Performance Fairness in Chip Multiprocessors

CWFP: Novel Collective Writeback and Fill Policy for Last-Level DRAM Cache.

An Efficient Lightweight Shared Cache Design for Chip Multiprocessors

Bayesian Theory Based Adaptive Proximity Data Accessing For Cmp Caches

PASCMP: A Novel Cache Framework for Data Mining Application

Cache Management with Partitioning-Aware Eviction and Thread-Aware Insertion/Promotion Policy

CIACP: A Correlation- and Iteration- Aware Cache Partitioning Mechanism to Improve Performance of Multiple Coarse-Grained Reconfigurable Arrays.

A Frequency Based Cache Replacement Algorithm with Partition of CMPs

A Novel Cache Replacement Policy Via Dynamic Adaptive Insertion And Re-Reference Prediction

SRC-based Cache Coherence Protocol in Chip Multiprocessor

Designing a Deep Neural Network engine for LLC block reuse prediction to mitigate Soft Error in Multicore

Network caching for Chip Multiprocessors