Abstract:Compute-in-memory (CIM) is a promising approach to solving the memory-wall problem existing in traditional computing architectures. In this paper, we introduce SSM-CIM, a charge-domain, static random-access memory (SRAM)-based CIM macro designed for area-energy-efficient convolutional neural network (CNN) inference. SSM-CIM utilizes an original sign-magnitude data encoding method for both inputs and weights. By codesigning four adjacent SRAM computing cells and employing a 3-bit digital-to-analog converter (DAC), SSM-CIM performs accurate 4-bit multiply-and-accumulate (MAC) computation in a single step, eliminating the peripheral digital shift-and-add circuits. To digitize the MAC computing results, a dedicated multi-reference assisted SAR ADC is designed by reusing the reference voltages from the DAC, which offers significant power and area savings. In addition, analog computing errors and quantization errors are analyzed to ensure the multi-bit computing accuracy of SSM-CIM. SSM-CIM is implemented and evaluated using 28-nm global foundry process. The post-layout simulation results validate the excellent computing linearity and accuracy of SSM-CIM. Benefitting from the compact layout design and fully parallel computing flow, the $144\times 256$ macro achieves a peak throughput of 2.3 TOPS, an area efficiency of 10.2 TOPS/mm2, and an energy efficiency of 205.4 TOPS/W with 4-bit weights and 4-bit inputs.

S2D-CIM: A 22nm 128kb Systolic Digital Compute-in-Memory Macro with Domino Data Path for Flexible Vector Operation and 2-D Weight Update in Edge AI Applications

A Robust 8-Bit Non-Volatile Computing-in-Memory Core for Low-Power Parallel MAC Operations.

A Low-Power In-Memory Multiplication and Accumulation Array with Modified Radix-4 Input and Canonical Signed Digit Weights

34.3 A 22nm 64kb Lightning-Like Hybrid Computing-in-Memory Macro with a Compressed Adder Tree and Analog-Storage Quantizers for Transformer and CNNs.

7.8 A 22nm Delta-Sigma Computing-In-Memory (Δ∑CIM) SRAM Macro with Near-Zero-Mean Outputs and LSB-First ADCs Achieving 21.38TOPS/W for 8b-MAC Edge AI Processing

S2D-CIM: SRAM-Based Systolic Digital Compute-in-Memory Framework with Domino Data Path Supporting Flexible Vector Operation and 2-D Weight Update

A 2.75-to-75.9tops/w Computing-in-Memory NN Processor Supporting Set-Associate Block-Wise Zero Skipping and Ping-Pong CIM with Simultaneous Computation and Weight Updating.

SSM-CIM: an Efficient CIM Macro Featuring Single-Step Multi-bit MAC Computation for CNN Edge Inference

An Edram Based Computing-in-Memory Macro with Full-Valid-Storage and Channel-Wise-Parallelism for Depthwise Neural Network

A High-Density and Reconfigurable SRAM-Based Digital Compute-In-Memory Macro for Low-Power AI Chips.

7.3 A 28nm 38-to-102-tops/w 8b Multiply-Less Approximate Digital SRAM Compute-In-Memory Macro for Neural-Network Inference

MixCIM: A Hybrid-Cell-Based Computing-in-Memory Macro with Less-Data-Movement and Activation-Memory-Reuse for Depthwise Separable Neural Networks

Simulation of a Fully Digital Computing-in-Memory for Non-Volatile Memory for Artificial Intelligence Edge Applications

An Event-Based Digital Compute-In-Memory Accelerator with Flexible Operand Resolution and Layer-Wise Weight/Output Stationarity

15.3 A 351TOPS/W and 372.4GOPS Compute-in-Memory SRAM Macro in 7nm FinFET CMOS for Machine-Learning Applications

24.5 A Twin-8T SRAM Computation-In-Memory Macro for Multiple-Bit CNN-Based Machine Learning

A 28-Nm 36 Kb SRAM CIM Engine with 0.173 $\mu $m$^{2}$ 4T1T Cell and Self-Load-0 Weight Update for AI Inference and Training Applications

A Twin-8T SRAM Computation-in-Memory Unit-Macro for Multibit CNN-Based AI Edge Processors

A 65 Nm 73 Kb SRAM-Based Computing-In-Memory Macro with Dynamic-Sparsity Controlling

A Multiply-Less Approximate SRAM Compute-In-Memory Macro for Neural-Network Inference

A 28-nm 64-kb 31.6-TFLOPS/W Digital-Domain Floating-Point-Computing-Unit and Double-Bit 6T-SRAM Computing-in-Memory Macro for Floating-Point CNNs