Abstract:Convolutional Neural Networks (CNNs) have achieved excellent performance on various artificial intelligence (AI) applications, while a higher demand on energy efficiency is required for future AI. Resistive Random-Access Memory (RRAM)-based computing system provides a promising solution to energy-efficient neural network training. However, it's difficult to support high-precision CNN in RRAM-based hardware systems. Firstly, multi-bit digital-analog interfaces will take up most energy overhead of the whole system. Secondly, it's difficult to write the RRAM to expected resistance states accurately; only low-precision numbers can be represented. To enable CNN training based on RRAM, we propose a low-bitwidth CNN training method, using low-bitwidth convolution outputs (CO), activations (A), weights (W) and gradients (G) to train CNN models based on RRAM. Furthermore, we design a system to implement the training algorithms. We explore the accuracy under different bitwidth combinations of (A,CO,W,G), and propose a practical tradeoff between accuracy and energy overhead. Our experiments demonstrate that the proposed system perform well on low-bitwidth CNN training tasks. For example, training LeNet-5 with 4-bit convolution outputs, 4-bit weights, 4-bit activations and 4-bit gradients on MNIST can still achieve 97.67% accuracy. Moreover, the proposed system can achieve 23.0X higher energy efficiency than GPU when processing the training task of LeNet-5, and 4.4X higher energy efficiency when processing the training task of ResNet-20.

Efficient SRAM computing in memory for CNN

A Configurable Computing-in-Memory Structure Based on Convolutional Neural Network

Energy-Efficient SRAM Design with Data-Aware Dual-Modes L0T Storage Cell for CNN Processors

CAP-RAM: A Charge-Domain In-Memory Computing 6T-SRAM for Accurate and Precision-Programmable CNN Inference

Reducing SRAM Reading Power with Column Data Segment and Weights Correlation Enhancement for CNN Processing.

Switched by input: power efficient structure for RRAM-based convolutional neural network.

Training Low Bitwidth Convolutional Neural Network on RRAM

An Energy-Efficient Mixed-Bit ReRAM-based Computing-in-Memory CNN Accelerator with Fully Parallel Readout.

RISC-V based Fully-Parallel SRAM Computing-in-Memory Accelerator with High Hardware Utilization and Data Reuse Rate

A Configurable Multi-Precision CNN Computing Framework Based on Single Bit RRAM

Design Framework for SRAM-Based Computing-In-Memory Edge CNN Accelerators

RRAM Based Buffer Design for Energy Efficient CNN Accelerator.

TAC-RAM: A 65nm 4kb SRAM Computing-in-Memory Design with 57.55 TOPS/W Supporting Multibit Matrix-Vector Multiplication for Binarized Neural Network.

A 4-Kb 1-to-8-bit Configurable 6T SRAM-Based Computation-in-Memory Unit-Macro for CNN-Based AI Edge Processors

A Systolic Computing-in-Memory Array Based Accelerator with Predictive Early Activation for Spatiotemporal Convolutions

24.5 A Twin-8T SRAM Computation-In-Memory Macro for Multiple-Bit CNN-Based Machine Learning

A ReRAM-Based Row-Column-Oriented Memory Architecture for Convolutional Neural Networks.

An Energy-Efficient Mixed-Bit CNN Accelerator With Column Parallel Readout for ReRAM-Based In-Memory Computing

A 28nm 64-Kb 31.6-TFLOPS/W Digital-Domain Floating-Point-Computing-Unit and Double-Bit 6T-SRAM Computing-in-Memory Macro for Floating-Point CNNs

A 28-Nm 135.19 TOPS/W Bootstrapped-SRAM Compute-in-Memory Accelerator with Layer-Wise Precision and Sparsity

Time-Domain Computing in Memory Using Spintronics for Energy-Efficient Convolutional Neural Network