Abstract:The large working sets of commercial and scientific workloads favor a shared L2 cache design that maximizes the aggregate cache capacity and minimizes off-chip memory requests in chip multiprocessors (CMP). There are two important hurdles that restrict the scalability of these chip multiprocessors: the on-chip memory cost of directory and the long L1 miss latencies. This work presents network caching architecture aimed at facing these two important problems. Network caching takes advantage of on-chip networks to manage shared data blocks and directory information in chip multiprocessors. The network caching architecture removes the directory structure from shared L2 caches and stores directory information for the blocks recently cached by L1 caches in the network interface components decreasing on-chip directory memory overhead and improves the scalability. The saved memory space is used as shared data caches or victim caches which are embedded into the network interface components to reduce L1 miss latencies further. This paper develops three network caching designs to reduce L1 miss latencies. The proposed architecture is evaluated based on simulations of a 16-core tiled CMP. First, we demonstrate that network caching architecture provides good scalability. Second, network caching architecture also provides robust performance. Third, different network caching designs have distinct impacts on performance of CMP. Against over the traditional shared L2 cache design, network victim cache (NVC) design improves performance by 23% on average, and up to 34% at best. Network shared cache (NSC) design provides performance improvement by 6% on average, and up to 16% at best. Network directory cache (NDC) design achieves performance improvement by 4% on average, and up to 11% at best.

Lowering Latency of Embedded Memory by Exploiting In-Cell Victim Cache Hierarchy Based on Emerging Multi-Level Memory Devices

Enhancing Lifetime and Performance of MLC NVM Caches using Embedded Trace buffers

Network Victim Cache: Leveraging Network-on-Chip for Managing Shared Caches in Chip Multiprocessors

Victor: A Variation-resilient Approach Using Cell-Clustered Charge-domain computing for High-density High-throughput MLC CiM

DASH: A duplication-aware flash cache architecture in virtualization environment

MScache: A buffer management scheme based on page-level address mapping for NAND-flash SSD

MBSA: a Lightweight and Flexible Storage Architecture for Virtual Machines

Adaptive Circuit Approaches to Low-Power Multi-Level/Cell FeFET Memory

A Study on Performance and Power Efficiency of Dense Non-Volatile Caches in Multi-Core Systems

Exploiting Narrow-Width Values for Improving Non-Volatile Cache Lifetime

A Method for Hiding the Increased Non-Volatile Cache Read Latency

TriZone: A Design of MLC STT-RAM Cache for Combined Performance, Energy, and Reliability Optimizations

A Spatial and Temporal Locality-Aware Adaptive Cache Design with Network Optimization for Tiled Many-Core Architectures.

A Unified Write Buffer Cache Management Scheme for Flash Memory

A Latency-Aware Garbage Collection Strategy

A Cache Management Strategy to Replace Wear Leveling Techniques for Embedded Flash Memory

Improve Llc Bypassing Performance By Memory Controller Improvements In Heterogeneous Multicore System

Hardware Memory Management for Future Mobile Hybrid Memory Systems

Statistical Cache Bypassing for Non-Volatile Memory

Network caching for Chip Multiprocessors

ICGMM: CXL-enabled Memory Expansion with Intelligent Caching Using Gaussian Mixture Model