Abstract:RRAM-based in-memory computing (IMC) effectively accelerates deep neural networks (DNNs) and other machine learning algorithms. On the other hand, in the presence of RRAM device variations and lower precision, the mapping of DNNs to RRAM-based IMC suffers from severe accuracy loss. In this work, we propose a novel hybrid IMC architecture that integrates an RRAM-based IMC macro with a digital SRAM macro using a programmable shifter to compensate for the RRAM variations and recover the accuracy. The digital SRAM macro consists of a small SRAM memory array and an array of multiply-and-accumulate (MAC) units. The nonideal output from the RRAM macro, due to device and circuit nonidealities, is compensated by adding the precise output from the SRAM macro. In addition, the programmable shifter allows for different scales of compensation by shifting the SRAM macro output relative to the RRAM macro output. On the algorithm side, we develop a framework for the training of DNNs to support the hybrid IMC architecture through ensemble learning. The proposed framework performs quantization (weights and activations), pruning, RRAM IMC-aware training, and employs ensemble learning through different compensation scales by utilizing the programmable shifter. Finally, we design a silicon prototype of the proposed hybrid IMC architecture in the 65-nm SUNY process to demonstrate its efficacy. Experimental evaluation of the hybrid IMC architecture shows that the SRAM compensation allows for a realistic IMC architecture with multilevel RRAM cells (MLCs) even though they suffer from high variations. The hybrid IMC architecture achieves up to 21.9%, 12.65%, and 6.52% improvement in post-mapping accuracy over state-of-the-art techniques, at minimal overhead, for ResNet-20 on CIFAR-10, VGG-16 on CIFAR-10, and ResNet-18 on ImageNet, respectively.

A 3d Multi-Layer Cmos-Rram Accelerator for Neural Network

DaDianNao: A Machine-Learning Supercomputer

LayCO: Achieving Least Lossy Accuracy for Most Efficient RRAM-Based Deep Neural Network Accelerator via Layer-Centric Co-Optimization

RRAM based learning acceleration.

Multiscale Co-Design Analysis of Energy, Latency, Area, and Accuracy of a ReRAM Analog Neural Training Accelerator

A Low-Latency DNN Accelerator Enabled by DFT-Based Convolution Execution Within Crossbar Arrays

RRAM-DNN: an RRAM and Model-Compression Empowered All-Weights-On-Chip DNN Accelerator

RRAM-based Analog-Weight Spiking Neural Network Accelerator with In-Situ Learning for IoT Applications

Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals

A Configurable Multi-Precision CNN Computing Framework Based on Single Bit RRAM

TIME: A Training-in-Memory Architecture for RRAM-Based Deep Neural Networks

Hybrid RRAM/SRAM in-Memory Computing for Robust DNN Acceleration

High-Throughput In-Memory Computing for Binary Deep Neural Networks with Monolithically Integrated RRAM and 90nm CMOS

An Energy Efficient Computing-in-Memory Accelerator With 1T2R Cell and Fully Analog Processing for Edge AI Applications

A Reduced Architecture for ReRAM-Based Neural Network Accelerator and Its Software Stack

ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars

A Hybrid RRAM-SRAM Computing-In-Memory Architecture for Deep Neural Network Inference-Training Edge Acceleration

Efficient Implementation of Multi-Channel Convolution in Monolithic 3D ReRAM Crossbar

Intra-array Non-Idealities Modeling and Algorithm Optimization for RRAM-based Computing-in-Memory Applications

Low Bit-Width Convolutional Neural Network on RRAM

Mixed Size Crossbar Based RRAM CNN Accelerator with Overlapped Mapping Method