Abstract:This article presents a compact, robust, and transposable SRAM in-memory computing (IMC) macro to support feed forward (FF) and back propagation (BP) computation within a single macro. The transpose macro is created with a clustering structure, and eight 6T bitcells are shared with one charge-domain computing unit (CCU) to efficiently deploy the DNNs weights. The normalized area overhead of clustering structure compared to 6T SRAM cell is only 0.37. During computation, the CCU performs robust charge-domain operations on the parasitic capacitances of the local bitlines in the IMC cluster. In the FF mode, the proposed design supports 128-input 1b XNOR and 1b AND multiplications and accumulations (MACs). The 1b AND can be extended to multi-bit MAC via bit-serial (BS) mapping, which can support DNNs with various precision. A power-gated auto-zero Flash analog-to-digital converter (ADC) reducing the input offset voltage maintains the overall energy efficiency and throughput. The proposed macro is prototyped in a 28-nm CMOS process. It demonstrates a 1b energy efficiency of $166\vert 257$ TOPS/W in FF-XNOR $\vert $ AND mode, and 31.8 TOPS/W in BP mode, respectively. The macro achieves $80.26\% \vert 85.07\%$ classification accuracy for the CIFAR-10 dataset with 1b $\vert 4\text{b}$ CNN models. Besides, 95.50% MNIST dataset classification accuracy (95.66% software accuracy) is achieved by the BP mode of the proposed transpose IMC macro.

A 16.38TOPS and 4.55POPS/W SRAM Computing-in-Memory Macro for Signed Operands Computation and Batch Normalization Implementation

A Low-Power In-Memory Multiplication and Accumulation Array with Modified Radix-4 Input and Canonical Signed Digit Weights

An 8-Bit in Resistive Memory Computing Core with Regulated Passive Neuron and Bitline Weight Mapping

A Robust 8-Bit Non-Volatile Computing-in-Memory Core for Low-Power Parallel MAC Operations.

A 1–8b Reconfigurable Digital SRAM Compute-in-Memory Macro for Processing Neural Networks

An XOR-10T SRAM computing-in-memory macro with current MAC operations and time-to-digital conversion for BNN edge processors

A 28nm 32kb SRAM Computing-in-Memory Macro with Hierarchical Capacity Attenuator and Input Sparsity-Optimized ADC for 4b Mac Operation

A 128 Kb DAC-less 6T SRAM computing-in-memory macro with prioritized subranging ADC for AI edge applications

A Reconfigurable SRAM Computing-in-Memory Macro Supporting Ping-Pong Operation and CIM Pipeline for Multi-mode MAC Operations

A Multiply-Less Approximate SRAM Compute-In-Memory Macro for Neural-Network Inference

SSM-CIM: an Efficient CIM Macro Featuring Single-Step Multi-bit MAC Computation for CNN Edge Inference

A 28nm 8Kb Reconfigurable SRAM Computing-In-Memory Macro With Input-Sparsity Optimized DTC for Multi-mode MAC Operations

A Dual-Wordline 6T SRAM Computing-In-Memory Macro Featuring Full Signed Multi-Bit Computation for Lightweight Networks

A Twin-8T SRAM Computation-in-Memory Unit-Macro for Multibit CNN-Based AI Edge Processors

A 137.5 TOPS/W SRAM Compute-in-Memory Macro with 9-b Memory Cell-Embedded ADCs and Signal Margin Enhancement Techniques for AI Edge Applications

A 4-Kb 1-to-8-bit Configurable 6T SRAM-Based Computation-in-Memory Unit-Macro for CNN-Based AI Edge Processors

An ADC-less RRAM-based Computing-in-Memory Macro with Binary CNN for Efficient Edge AI

A 28 Nm 16 Kb Bit-Scalable Charge-Domain Transpose 6T SRAM In-Memory Computing Macro

24.5 A Twin-8T SRAM Computation-In-Memory Macro for Multiple-Bit CNN-Based Machine Learning

A 65 Nm 73 Kb SRAM-Based Computing-In-Memory Macro with Dynamic-Sparsity Controlling

34.3 A 22nm 64kb Lightning-Like Hybrid Computing-in-Memory Macro with a Compressed Adder Tree and Analog-Storage Quantizers for Transformer and CNNs.