Abstract:Processing-in-memory (PIM) architectures show the advantage of handling applications that generate complicated memory request patterns; usually, those kinds of memory streams degrade the application’s performance in conventional memory hierarchy systems. In particular, deep convolutional neural networks (DCNNs) processing that consists of several functionalities could be highly optimized if PIM cores can extend the processing capability and data accessibility. In this work, we propose a functionality-based PIM accelerator for DCNNs. We design several modules in addition to the conventional PIM system based on a hybrid memory cube (HMC). First, we compose a new buffer module, namely, a shared cache, in which PIM cores are provided DCNN functionalities and pre-trained weights. The PIM cores subsequently enhance computational utilization and data accessibility. Second, an efficient replacement method complements the shared cache to optimize the data miss rate of DCNN processing. Third, we compose dual prefetchers that can deal with DCNN’s memory access patterns, thereby reducing the system’s overall latency. Fourth, we compose a PIM scheduler for PIM core-level autonomous request control. The PIM scheduler relieves the host processor of significant computational loads, achieving the overall latency of the system and reducing the energy consumption. By the performance evaluation based on the trace-driven HMC simulator, our proposed model improves average latency and bandwidth by 38.9 and 27.9 % with only 18.7 % more energy consumption compared with conventional HMC-based PIM systems. Our system also achieves scalable processing performance because when the DCNN becomes deeper, it processes faster than conventional PIM systems.

MobiLatice

A Low-Power In-Memory Multiplication and Accumulation Array with Modified Radix-4 Input and Canonical Signed Digit Weights

Lattice: an ADC/DAC-less ReRAM-based Processing-In-Memory Architecture for Accelerating Deep Convolution Neural Networks

Simulation of a Fully Digital Computing-in-Memory for Non-Volatile Memory for Artificial Intelligence Edge Applications

NAND-SPIN-based processing-in-MRAM architecture for convolutional neural network acceleration

Containing Analog Data Deluge at Edge through Frequency-Domain Compression in Collaborative Compute-in-Memory Networks

Specific ADC of NVM-Based Computation-in-Memory for Deep Neural Networks

A Reconfigurable Computing-in-Memory Accelerator with Dynamic Group-Based Dataflow and Dual-Input Macro Designs

StoX-Net: Stochastic Processing of Partial Sums for Efficient In-Memory Computing DNN Accelerators

Cambricon-LLM: A Chiplet-Based Hybrid Architecture for On-Device Inference of 70B LLM

LRMP: Layer Replication with Mixed Precision for Spatial In-memory DNN Accelerators

Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals

Functionality-Based Processing-in-Memory Accelerator for Deep Convolutional Neural Networks

MixCIM: A Hybrid-Cell-Based Computing-in-Memory Macro with Less-Data-Movement and Activation-Memory-Reuse for Depthwise Separable Neural Networks

APIM: An Antiferromagnetic MRAM-Based Processing-In-Memory System for Efficient Bit-level Operations of Quantized Convolutional Neural Networks

A Non-Volatile Computing-In-Memory Framework with Margin Enhancement Based CSA and Offset Reduction Based ADC.

Bulk-Switching Memristor-Based Compute-In-Memory Module for Deep Neural Network Training

Design of Computing-in-Memory (CIM) with Vertical Split-Gate Flash Memory for Deep Neural Network (DNN) Inference Accelerator

In-Memory Computing Integrated Structure Circuit Based on Nonvolatile Flash Memory Unit

End-to-End DNN Inference on a Massively Parallel Analog In Memory Computing Architecture

A Heterogeneous Microprocessor for Intermittent AI Inference Using Nonvolatile-SRAM-based Compute-In-Memory