Abstract:Emerging memory technologies such as STT-RAM, PCRAM, and resistive RAM are being explored as potential replacements to existing on-chip caches or main memories for future multi-core architectures. This is due to the many attractive features these memory technologies posses: high density, low leakage, and non-volatility. However, the latency and energy overhead associated with the write operations of these emerging memories has become a major obstacle in their adoption. Previous works have proposed various circuit and architectural level solutions to mitigate the write overhead. In this paper, we study the integration of STT-RAM in a 3D multi-core environment and propose solutions at the on-chip network level to circumvent the write overhead problem in the cache architecture with STT-RAM technology. Our scheme is based on the observation that instead of staggering requests to a write-busy STT-RAM bank, the network should schedule requests to other idle cache banks for effectively hiding the latency. Thus, we prioritize cache accesses to the idle banks by delaying accesses to the STT-RAM cache banks that are currently serving long latency write requests. Through a detailed characterization of the cache access patterns of 42 applications, we propose an efficient mechanism to facilitate such delayed writes to cache banks by (a) accurately estimating the busy time of each cache bank through logical partitioning of the cache layer and (b) prioritizing packets in a router requesting accesses to idle banks. Evaluations on a 3D architecture, consisting of 64 cores and 64 STT-RAM cache banks, show that our proposed approach provides 14% average IPC improvement for multi-threaded benchmarks, 19% instruction throughput benefits for multi-programmed workloads, and 6% latency reduction compared to a recently proposed write buffering mechanism.

An STT-MRAM Based in Memory Architecture for Low Power Integral Computing

A Low-Power In-Memory Multiplication and Accumulation Array with Modified Radix-4 Input and Canonical Signed Digit Weights

Toward Energy Efficient STT-MRAM-based Near Memory Computing Architecture for Embedded Systems

An energy efficient and high speed architecture for convolution computing based on binary resistive random access memory

A Multilevel Cell STT-MRAM-Based Computing In-Memory Accelerator for Binary Convolutional Neural Network

Proposal of Analog In-Memory Computing with Magnified Tunnel Magnetoresistance Ratio and Universal STT-MRAM Cell

A 28nm 8928Kb/mm 2 -Weight-Density Hybrid SRAM/ROM Compute-in-Memory Architecture Reducing >95% Weight Loading from DRAM.

Architecture-level energy model for high-capacity STT-MRAM memory

In‐Memory Mathematical Operations with Spin‐Orbit Torque Devices

Architecting On-Chip Interconnects for Stacked 3D STT-RAM Caches in CMPs

Area-Aware Optimization of MRAM Crossbar Array Bit-Cell for In-Memory Computing

Architecture-circuit-technology Co-Optimization for Resistive Random Access Memory-Based Computation-in-memory Chips

STT-RAM-Based Hierarchical in-Memory Computing

CHIME: Energy-Efficient STT-RAM-based Concurrent Hierarchical In-Memory Processing

Triangle Counting Accelerations: From Algorithm to In-Memory Computing Architecture

Domain-Specific STT-MRAM-Based In-Memory Computing: A Survey

Enabling architectural innovations using non-volatile memory.

Low Power Computing Using STT-MRAM

Stoch-IMC: A Bit-Parallel Stochastic In-Memory Computing Architecture Based on STT-MRAM

EXTENT: Enabling Approximation-Oriented Energy Efficient STT-RAM Write Circuit