Abstract:In the post-Moore's era, compute-in-memory (CIM) techniques are promising to break the memory wall. In particular, SRAM-based CIMs (SRAM-CIMs) have attracted widespread attention owing to its good scalability with advanced process. At present, a rich variety of works focus on energy-efficiency improvement by either designing different bit-cell structures or optimizing circuit/chip architectures. However, owing to the CIM's primitive property to store one of the operands in the memory bit-cells, substantial computing resource is wasted by suspension during the operands loading procedure. In this paper, a high-throughput SRAM-CIM (HiT-CIM) architecture with simultaneous weight loading and computing capabilities is proposed by integrating on-chip nonvolatile MRAM (magnetic random-access memory). Meanwhile, both the mainstream current-domain and charge-domain SRAM bit-cell structures are optimized to support such an architecture. Furthermore, a reconfigurable fully-pipelined MRAM is designed to provide fast data loading in HiT-CIM, which can finetune weight loading strategy rapidly for different neural network models. Afterwards, an optimal evaluation and configuration strategy is proposed to improve the macro-level performance by considering the key components and parameters in terms of SRAM array, ADC, MRAM structure and frequency. Finally, the HiT-CIM's feasibility is verified under a 40-nm foundry's process. The results show that a multiple-fold speed improvement can be obtained on VGG19, ResNet18 and MobileNetV1, respectively. In specific, the area efficiency of HiT-CIM on VGG19 achieves 1124 GOPS/mm2 and 1880.12 GOPS/mm2 for the current-domain and chargedomain SRAM-CIMs, respectively. Up to 5.3× improvement is realized compared with prior works

A 40nm 1mb 35.6 TOPS/W MLC NOR-Flash Based Computation-in-Memory Structure for Machine Learning

A Low-Power In-Memory Multiplication and Accumulation Array with Modified Radix-4 Input and Canonical Signed Digit Weights

MLFlash-CIM: Embedded Multi-Level NOR-Flash Cell based Computing in Memory Architecture for Edge AI Devices

Simulation of a Fully Digital Computing-in-Memory for Non-Volatile Memory for Artificial Intelligence Edge Applications

Design of Computing-in-Memory (CIM) with Vertical Split-Gate Flash Memory for Deep Neural Network (DNN) Inference Accelerator

33.2 A Fully Integrated Analog ReRAM Based 78.4TOPS/W Compute-In-Memory Chip with Fully Parallel MAC Computing.

A Multiply-Less Approximate SRAM Compute-In-Memory Macro for Neural-Network Inference

Optimized operation scheme of flash-memory-based neural network online training with ultra-high endurance

A 28nm 32kb SRAM Computing-in-Memory Macro with Hierarchical Capacity Attenuator and Input Sparsity-Optimized ADC for 4b Mac Operation

IMPACT:InMemory ComPuting Architecture Based on Y-FlAsh Technology for Coalesced Tsetlin Machine Inference

34.3 A 22nm 64kb Lightning-Like Hybrid Computing-in-Memory Macro with a Compressed Adder Tree and Analog-Storage Quantizers for Transformer and CNNs.

In-Memory Computing Integrated Structure Circuit Based on Nonvolatile Flash Memory Unit

A 28-nm 64-kb 31.6-TFLOPS/W Digital-Domain Floating-Point-Computing-Unit and Double-Bit 6T-SRAM Computing-in-Memory Macro for Floating-Point CNNs

TAC-RAM: A 65nm 4kb SRAM Computing-in-Memory Design with 57.55 TOPS/W Supporting Multibit Matrix-Vector Multiplication for Binarized Neural Network.

A 28nm 2Mb STT-MRAM Computing-in-Memory Macro with a Refined Bit-Cell and 22.4 - 41.5TOPS/W for AI Inference.

24.5 A Twin-8T SRAM Computation-In-Memory Macro for Multiple-Bit CNN-Based Machine Learning

Specific ADC of NVM-Based Computation-in-Memory for Deep Neural Networks

A 3D MCAM Architecture Based on Flash Memory Enabling Binary Neural Network Computing for Edge AI

HiT-CIM: A High-Throughput Compute-In-Memory SRAM Architecture with Simultaneous Weight Loading/Computing and Balance Capabilities

A 2.75-to-75.9tops/w Computing-in-Memory NN Processor Supporting Set-Associate Block-Wise Zero Skipping and Ping-Pong CIM with Simultaneous Computation and Weight Updating.

15.3 A 351TOPS/W and 372.4GOPS Compute-in-Memory SRAM Macro in 7nm FinFET CMOS for Machine-Learning Applications