Abstract:Recent SRAM-based computation-in-memory (CIM) macros enable mid-to-high precision multiply-and-accumulate (MAC) operations with improved energy efficiency using ultra-small/small capacity (0.4-8KB) memory devices. However, advanced CIM-based edge-AI chips favor multiple mid/large capacity SRAM-CIM macros: with high input (IN) and weight (W) precision to reduce the frequency of data reloads from external DRAM, and to avoid the need for additional SRAM buffers or ultra-large on-chip weight buffers. However, enlarging memory capacity and throughput increases the delay parasitics on WLs and BLs, and the number of parallel computing elements; resulting in longer compute latency (tAC), lower energy-efficiency (EF), degraded signal margin, and larger fluctuations in power consumption across data-patterns (see Fig. 16.3.1). Recent SRAM-CIM macros tend to not use in-lab SRAM cells, with a logic-based layout, in favor of foundry provided compact-layout 8T [2], 3, [5] or 6T cells with local-computing cells (LCCs) [4], [6] to reduce the cell-array area and facilitate manufacturing. This paper presents a SRAM-CIM structure using (1) a segmented-BL charge-sharing (SBCS) scheme for MAC operations, with low energy consumption and a consistently high signal margin across MAC values (MACV); (2) An new LCC cell, called a source-injection local-multiplication cell (SILMC), to support the SBCS scheme with a consistent signal margin against transistor process variation; and (3) A prioritized-hybrid-ADC (Ph-ADC) to achieve a small area and power overhead for analog readout. A 28nm 384kb SRAM-CIM macro was fabricated using a foundry compact-6T cell with support for MAC operations with 16 accumulations of 8b-inputs and 8b-weights with near-full precision output (20b). This macro achieves a 7.2ns tAC and a 22.75TOPS/W EF for 8b-MAC operations with an FoM (IN-precision × W-precision × output-ratio × output-channel × EF/tAC) 6× higher than prior work.

A 65-Nm RRAM Compute-in-Memory Macro for Genome Processing

A 40nm 150 TOPS/W High Row-Parallel MRAM Compute-in-Memory Macro with Series 3T1MTJ Bitcell for MAC Operation

A 28 Nm RRAM-Based 81.1 TOPS/mm2/bit Compute-In-Memory Macro with Uniform and Linear 64 Read Channels under 512 4-Bit Inputs

An RRAM-Based Digital Computing-in-Memory Macro with Dynamic Voltage Sense Amplifier and Sparse-Aware Approximate Adder Tree

A 40-nm, 64-Kb, 56.67 TOPS/W Voltage-Sensing Computing-In-Memory/Digital RRAM Macro Supporting Iterative Write With Verification and Online Read-Disturb Detection

A 28nm 2Mb STT-MRAM Computing-in-Memory Macro with a Refined Bit-Cell and 22.4 - 41.5TOPS/W for AI Inference.

A 28nm Hybrid 2T1R RRAM Computing-in-Memory Macro for Energy-efficient AI Edge Inference

A Reconfigurable 4T2R ReRAM Computing In-Memory Macro for Efficient Edge Applications

A 28-Nm RRAM Computing-in-Memory Macro Using Weighted Hybrid 2T1R Cell Array and Reference Subtracting Sense Amplifier for AI Edge Inference

A 28-Nm 9t Sram-Based Cim Macro with Capacitance Weighting Module and Redundant Array-Assisted Adc

Configurable and High-Throughput CIM SRAM for Boolean Logic Operation with 321 GOPS/kb and 164395.6 GOPS/mm2

15.3 A 351TOPS/W and 372.4GOPS Compute-in-Memory SRAM Macro in 7nm FinFET CMOS for Machine-Learning Applications

An 8.8 TFLOPS/W Floating-Point RRAM-Based Compute-in-Memory Macro Using Low Latency Triangle-Style Mantissa Multiplication

Cramming More Weight Data Onto Compute-in-Memory Macros for High Task-Level Energy Efficiency Using Custom ROM with 3984-Kb/mm$^{2}$ Density in 65-Nm CMOS

16.3 A 28nm 384kb 6T-SRAM Computation-in-Memory Macro with 8b Precision for AI Edge Chips

An 8T SRAM Based Digital Compute-In-Memory Macro For Multiply-And-Accumulate Accelerating.

A Novel 9T1C-SRAM Compute-In-Memory Macro With Count-Less Pulse-Width Modulation Input and ADC-Less Charge-Integration-Count Output

A 1.041-Mb/mm 2 27.38-TOPS/W Signed-INT8 Dynamic-Logic-Based ADC-less SRAM Compute-in-Memory Macro in 28nm with Reconfigurable Bitwise Operation for AI and Embedded Applications

A 28nm 32kb SRAM Computing-in-Memory Macro with Hierarchical Capacity Attenuator and Input Sparsity-Optimized ADC for 4b Mac Operation

In Situ Storing 8T SRAM-CIM Macro for Full-Array Boolean Logic and Copy Operations

A 6.54-to-26.03 TOPS/W Computing-In-Memory RNN Processor Using Input Similarity Optimization and Attention-based Context-breaking with Output Speculation