Abstract:Non-volatile memory (NVM) based computing-in-memory (CIM) shows significant advantages in handling deep learning tasks for artificial intelligence (AI) applications. To overcome the decreasing cost effectiveness of transistor scaling and the intrinsic inefficiency of data-shuttling in the von-Neumann architecture, CIM is proposed to realize high-speed and low-power system with parallel multiplication accumulation (MAC) computing [1] [2]. However, current demonstrations are mainly based on single macro and present limited computing parallelism. Realizing a fully-integrated CIM chip with a complete neural network model is still missing. The major challenges lie in: (1) The IR drop and transient errors when carrying out MAC operations in non-volatile memory arrays decrease the computing accuracy and further limit the parallelism; (2) The inefficiency of the interface blocks between different arrays due to the power overhead of the A/D and D/A converters (shown in Fig. 33.2.1). To address these challenges, this work proposes: (1) A sign-weighted 2T2R (SW-2T2R) array to reduce IR drop by decreasing the accumulative SL current (ISL), and eventually boost the computing parallelism; (2) a low-power interface design with resolution-adjustable LPAR-ADC to realize flexible tradeoff between system accuracy and power consumption. In this manner, this work implements a fully-integrated 784-100-10 MLP model on an integrated CIM chip with158.8kb analog ReRAMs. This chip realizes high recognition accuracy (94.4%) on MNIST database, high inference speed (77 µs/lmage), and 78.4 TOPS/W peak energy efficiency. The CMOS circuits are fabricated in a 130nm process.

An Edram Based Computing-in-Memory Macro with Full-Valid-Storage and Channel-Wise-Parallelism for Depthwise Neural Network

MixCIM: A Hybrid-Cell-Based Computing-in-Memory Macro with Less-Data-Movement and Activation-Memory-Reuse for Depthwise Separable Neural Networks

34.3 A 22nm 64kb Lightning-Like Hybrid Computing-in-Memory Macro with a Compressed Adder Tree and Analog-Storage Quantizers for Transformer and CNNs.

SSM-CIM: an Efficient CIM Macro Featuring Single-Step Multi-bit MAC Computation for CNN Edge Inference

A 28nm 32kb SRAM Computing-in-Memory Macro with Hierarchical Capacity Attenuator and Input Sparsity-Optimized ADC for 4b Mac Operation

A 65 Nm 73 Kb SRAM-Based Computing-In-Memory Macro with Dynamic-Sparsity Controlling

A Multiply-Less Approximate SRAM Compute-In-Memory Macro for Neural-Network Inference

A 28nm 8Kb Reconfigurable SRAM Computing-In-Memory Macro With Input-Sparsity Optimized DTC for Multi-mode MAC Operations

24.5 A Twin-8T SRAM Computation-In-Memory Macro for Multiple-Bit CNN-Based Machine Learning

A Twin-8T SRAM Computation-in-Memory Unit-Macro for Multibit CNN-Based AI Edge Processors

33.2 A Fully Integrated Analog ReRAM Based 78.4TOPS/W Compute-In-Memory Chip with Fully Parallel MAC Computing.

A 1–8b Reconfigurable Digital SRAM Compute-in-Memory Macro for Processing Neural Networks

An XOR-10T SRAM computing-in-memory macro with current MAC operations and time-to-digital conversion for BNN edge processors

A 4-Bit Calibration-Free Computing-In-Memory Macro with 3T1C Current-Programed Dynamic-Cascode Multi-Level-Cell Edram

A Reconfigurable Computing-in-Memory Accelerator with Dynamic Group-Based Dataflow and Dual-Input Macro Designs

A Digital SRAM Computing-in-Memory Design Utilizing Activation Unstructured Sparsity for High-Efficient DNN Inference

An ADC-less RRAM-based Computing-in-Memory Macro with Binary CNN for Efficient Edge AI

An Energy-Efficient Floating-Point Compute SRAM with Pipelined In-Memory Bit-Parallel Exponent and Bitwise Mantissa Processing

S2D-CIM: A 22nm 128kb Systolic Digital Compute-in-Memory Macro with Domino Data Path for Flexible Vector Operation and 2-D Weight Update in Edge AI Applications

A 128 Kb DAC-less 6T SRAM computing-in-memory macro with prioritized subranging ADC for AI edge applications

DS-CIM: A 40nm Asynchronous Dual-Spike Driven, MRAM Compute-In-Memory Macro for Spiking Neural Network