Abstract:RRAM-based in-memory computing (IMC) effectively accelerates deep neural networks (DNNs) and other machine learning algorithms. On the other hand, in the presence of RRAM device variations and lower precision, the mapping of DNNs to RRAM-based IMC suffers from severe accuracy loss. In this work, we propose a novel hybrid IMC architecture that integrates an RRAM-based IMC macro with a digital SRAM macro using a programmable shifter to compensate for the RRAM variations and recover the accuracy. The digital SRAM macro consists of a small SRAM memory array and an array of multiply-and-accumulate (MAC) units. The nonideal output from the RRAM macro, due to device and circuit nonidealities, is compensated by adding the precise output from the SRAM macro. In addition, the programmable shifter allows for different scales of compensation by shifting the SRAM macro output relative to the RRAM macro output. On the algorithm side, we develop a framework for the training of DNNs to support the hybrid IMC architecture through ensemble learning. The proposed framework performs quantization (weights and activations), pruning, RRAM IMC-aware training, and employs ensemble learning through different compensation scales by utilizing the programmable shifter. Finally, we design a silicon prototype of the proposed hybrid IMC architecture in the 65-nm SUNY process to demonstrate its efficacy. Experimental evaluation of the hybrid IMC architecture shows that the SRAM compensation allows for a realistic IMC architecture with multilevel RRAM cells (MLCs) even though they suffer from high variations. The hybrid IMC architecture achieves up to 21.9%, 12.65%, and 6.52% improvement in post-mapping accuracy over state-of-the-art techniques, at minimal overhead, for ResNet-20 on CIFAR-10, VGG-16 on CIFAR-10, and ResNet-18 on ImageNet, respectively.

HARNS: High-level Architectural Model of RRAM Based Computing-in-memory NPU

A Unified Framework for Training, Mapping and Simulation of ReRAM-Based Convolutional Neural Network Acceleration.

FangTianSim: High-Level Cycle-Accurate Resistive Random-Access Memory-Based Multi-Core Spiking Neural Network Processor Simulator

XB-SIM∗: A Simulation Framework for Modeling and Exploration of ReRAM-based CNN Acceleration Design

CLEAR: a Full-Stack Chip-in-loop Emulator for Analog RRAM Based Computing-in-memory System

NAS4RRAM: Neural Network Architecture Search for Inference on RRAM-based Accelerators

A Compact Model of Analog RRAM for Neuromorphic Computing System Design

Hdc-Im: Hyperdimensional Computing In-Memory Architecture Based On Rram

Design Guidelines of RRAM Based Neural-Processing-Unit

A Compact Model of Analog RRAM With Device and Array Nonideal Effects for Neuromorphic Systems

RNC: Efficient RRAM-aware NAS and Compilation for DNNs on Resource-Constrained Edge Devices

ReARTSim: an ReRAM array transient simulator with GPU optimized runtime Acceleration

Intelligent Computing with RRAM

Architecture-circuit-technology Co-Optimization for Resistive Random Access Memory-Based Computation-in-memory Chips

Intra-array Non-Idealities Modeling and Algorithm Optimization for RRAM-based Computing-in-Memory Applications

Hybrid RRAM/SRAM in-Memory Computing for Robust DNN Acceleration

Benchmarking and modeling of analog and digital SRAM in-memory computing architectures

Multi-Scale Thermal Modeling of RRAM-based 3D Monolithic-Integrated Computing-in-Memory Chips

Enabling RRAM-Based Brain-Inspired Computation by Co-design of Device, Circuit, and System

Multiscale Co-Design Analysis of Energy, Latency, Area, and Accuracy of a ReRAM Analog Neural Training Accelerator

Learning the sparsity for ReRAM - mapping and pruning sparse neural network for ReRAM based accelerator.