Abstract:RRAM-based in-memory computing (IMC) effectively accelerates deep neural networks (DNNs) and other machine learning algorithms. On the other hand, in the presence of RRAM device variations and lower precision, the mapping of DNNs to RRAM-based IMC suffers from severe accuracy loss. In this work, we propose a novel hybrid IMC architecture that integrates an RRAM-based IMC macro with a digital SRAM macro using a programmable shifter to compensate for the RRAM variations and recover the accuracy. The digital SRAM macro consists of a small SRAM memory array and an array of multiply-and-accumulate (MAC) units. The nonideal output from the RRAM macro, due to device and circuit nonidealities, is compensated by adding the precise output from the SRAM macro. In addition, the programmable shifter allows for different scales of compensation by shifting the SRAM macro output relative to the RRAM macro output. On the algorithm side, we develop a framework for the training of DNNs to support the hybrid IMC architecture through ensemble learning. The proposed framework performs quantization (weights and activations), pruning, RRAM IMC-aware training, and employs ensemble learning through different compensation scales by utilizing the programmable shifter. Finally, we design a silicon prototype of the proposed hybrid IMC architecture in the 65-nm SUNY process to demonstrate its efficacy. Experimental evaluation of the hybrid IMC architecture shows that the SRAM compensation allows for a realistic IMC architecture with multilevel RRAM cells (MLCs) even though they suffer from high variations. The hybrid IMC architecture achieves up to 21.9%, 12.65%, and 6.52% improvement in post-mapping accuracy over state-of-the-art techniques, at minimal overhead, for ResNet-20 on CIFAR-10, VGG-16 on CIFAR-10, and ResNet-18 on ImageNet, respectively.

An In-Memory-Computing STT-MRAM Macro with Analog ReLU and Pooling Layers for Ultra-High Efficient Neural Network

A Low-Power In-Memory Multiplication and Accumulation Array with Modified Radix-4 Input and Canonical Signed Digit Weights

In-Memory Multi-Bit Multiplication and Accumulation (MAC) Using FeFET for Energy Efficient IoT

SRAM-Based In-Memory Computing Macro Featuring Voltage-Mode Accumulator and Row-by-Row ADC for Processing Neural Networks

A Brain-Inspired ADC-Free SRAM-Based In-Memory Computing Macro with High-Precision MAC for AI Application

An ADC-less RRAM-based Computing-in-Memory Macro with Binary CNN for Efficient Edge AI

A ReRAM-Based Computing-in-Memory Convolutional-Macro with Customized 2T2R Bit-Cell for AIoT Chip IP Applications

A Multiply-Less Approximate SRAM Compute-In-Memory Macro for Neural-Network Inference

TD-SRAM: Time-Domain-Based In-Memory Computing Macro for Binary Neural Networks

An 11T1C Bit-Level-Sparsity-Aware Computing-in-Memory Macro with Adaptive Conversion Time and Computation Voltage

Hybrid RRAM/SRAM in-Memory Computing for Robust DNN Acceleration

Efficient Discrete Temporal Coding Spike-Driven In-Memory Computing Macro for Deep Neural Network Based on Nonvolatile Memory.

34.3 A 22nm 64kb Lightning-Like Hybrid Computing-in-Memory Macro with a Compressed Adder Tree and Analog-Storage Quantizers for Transformer and CNNs.

A 177 TOPS/W, Capacitor-based In-Memory Computing SRAM Macro with Stepwise-Charging/Discharging DACs and Sparsity-Optimized Bitcells for 4-Bit Deep Convolutional Neural Networks.

33.2 A Fully Integrated Analog ReRAM Based 78.4TOPS/W Compute-In-Memory Chip with Fully Parallel MAC Computing.

An Energy Efficient Computing-in-Memory Accelerator With 1T2R Cell and Fully Analog Processing for Edge AI Applications

RIMAC: an Array-Level ADC/DAC-Free ReRAM-Based In-Memory DNN Processor with Analog Cache and Computation.

A Charge-Domain Scalable-Weight In-Memory Computing Macro With Dual-SRAM Architecture for Precision-Scalable DNN Accelerators

Proposal of Analog In-Memory Computing with Magnified Tunnel Magnetoresistance Ratio and Universal STT-MRAM Cell

An SRAM Compute-In-Memory Macro Based on Direct Coupling SAR ADC and DAC Reuse

A Fully Integrated System‐on‐Chip Design with Scalable Resistive Random‐Access Memory Tile Design for Analog in‐Memory Computing