Abstract:Emerging memory technologies such as STT-RAM, PCRAM, and resistive RAM are being explored as potential replacements to existing on-chip caches or main memories for future multi-core architectures. This is due to the many attractive features these memory technologies posses: high density, low leakage, and non-volatility. However, the latency and energy overhead associated with the write operations of these emerging memories has become a major obstacle in their adoption. Previous works have proposed various circuit and architectural level solutions to mitigate the write overhead. In this paper, we study the integration of STT-RAM in a 3D multi-core environment and propose solutions at the on-chip network level to circumvent the write overhead problem in the cache architecture with STT-RAM technology. Our scheme is based on the observation that instead of staggering requests to a write-busy STT-RAM bank, the network should schedule requests to other idle cache banks for effectively hiding the latency. Thus, we prioritize cache accesses to the idle banks by delaying accesses to the STT-RAM cache banks that are currently serving long latency write requests. Through a detailed characterization of the cache access patterns of 42 applications, we propose an efficient mechanism to facilitate such delayed writes to cache banks by (a) accurately estimating the busy time of each cache bank through logical partitioning of the cache layer and (b) prioritizing packets in a router requesting accesses to idle banks. Evaluations on a 3D architecture, consisting of 64 cores and 64 STT-RAM cache banks, show that our proposed approach provides 14% average IPC improvement for multi-threaded benchmarks, 19% instruction throughput benefits for multi-programmed workloads, and 6% latency reduction compared to a recently proposed write buffering mechanism.

Design of last-level on-chip cache using spin-torque transfer RAM (STT RAM)

RAM and TCAM Designs by Using STT-MRAM

Read-Tuned STT-RAM and eDRAM Cache Hierarchies for Throughput and Energy Enhancement

Asymmetric-access aware optimization for STT-RAM caches with process variations.

An Energy-Efficient Scheme for STT-RAM L1 Cache

A Cache Energy Optimization Technique for STT-RAM Last Level Cache

Architecting On-Chip Interconnects for Stacked 3D STT-RAM Caches in CMPs

Probabilistic design methodology to improve run-time stability and performance of STT-RAM caches

HALLS: An Energy-Efficient Highly Adaptable Last Level STT-RAM Cache for Multicore Systems

An Architecture-Level Cache Simulation Framework Supporting Advanced PMA STT-MRAM

Spin-hall Assisted STT-RAM Design and Discussion

A SELECTIVE READ-BEFORE-WRITE SCHEME FOR ENERGY-AWARE SPIN TORQUE TRANSFER RAM (STT-RAM) CACHE DESIGN

Adaptive Placement and Migration Policy for an STT-RAM-based Hybrid Cache

Persistent and Nonpersistent Error Optimization for STT-RAM Cell Design.

TriZone: A Design of MLC STT-RAM Cache for Combined Performance, Energy, and Reliability Optimizations

Designing Scratchpad Memory Architecture with Emerging STT-RAM Memory Technologies

A Novel Architecture Of The 3d Stacked Mram L2 Cache For Cmps

EXTENT: Enabling Approximation-Oriented Energy Efficient STT-RAM Write Circuit

Phase Based And Application Based Dynamic Encoding Scheme For Multi-Level Cell Stt-Ram

A Study on Performance and Power Efficiency of Dense Non-Volatile Caches in Multi-Core Systems

Research and Analysis of Design and Optimization of Magnetic Memory Material Cache Based on STT-MRAM