Abstract:Compute-in-memory (CIM) accelerators using non-volatile memory (NVM) devices offer promising solutions for energy-efficient and low-latency Deep Neural Network (DNN) inference execution. However, practical deployment is often hindered by the challenge of dealing with the massive amount of model weight parameters impacted by the inherent device variations within non-volatile computing-in-memory (NVCIM) accelerators. This issue significantly offsets their advantages by increasing training overhead, the time and energy needed for mapping weights to device states, and diminishing inference accuracy. To mitigate these challenges, we propose the "Tiny Shared Block (TSB)" method, which integrates a small shared 1x1 convolution block into the DNN architecture. This block is designed to stabilize feature processing across the network, effectively reducing the impact of device variation. Extensive experimental results show that TSB achieves over 20x inference accuracy gap improvement, over 5x training speedup, and weights-to-device mapping cost reduction while requiring less than 0.4% of the original weights to be write-verified during programming, when compared with state-of-the-art baseline solutions. Our approach provides a practical and efficient solution for deploying robust DNN models on NVCIM accelerators, making it a valuable contribution to the field of energy-efficient AI hardware.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the decline in inference accuracy and low training efficiency caused by device variation when deploying deep neural network (DNN) models on non - volatile in - memory computing (NVCIM) accelerators. Specifically: 1. **Impact of device variation**: The inherent variation characteristics of non - volatile memory (NVM) devices (such as inter - cycle and inter - device variations) can lead to imprecise weight representation, thereby reducing the inference accuracy of DNN on NVCIM accelerators. 2. **Limitations of existing solutions**: - **Accuracy**: Existing noise - injection training methods are difficult to ensure high precision for each weight because the parameter space is huge and each parameter changes independently. - **Training efficiency**: These methods increase additional computational overhead, resulting in a significant increase in training time. - **Deployment complexity**: Programming a single weight to a device requires multiple read - write operations, increasing deployment time and energy consumption, especially in large - scale models. To solve these problems, the authors propose a new method called "Tiny Shared Block (TSB)". TSB aims to stabilize feature processing and reduce the impact of device variation by integrating a small shared 1×1 convolution block into the DNN architecture. Specifically: - **Improve inference accuracy**: TSB can significantly improve inference accuracy, more than 20 times better than the best existing baseline method. - **Accelerate the training process**: The TSB method can increase the training speed by more than 5 times. - **Reduce weight - mapping cost**: TSB only needs to perform write - verification on less than 0.4% of the original weights, greatly reducing programming time and operational complexity. - **Compatible with existing architectures**: The TSB method is fully compatible with existing accelerator architectures without the need to add new functional circuits. ### Formula summary - Let \( X\in\mathbb{R}^{H'\times W'\times C'} \) and \( V\in\mathbb{R}^{H\times W\times C} \) be the input and output feature maps respectively, then the convolution transformation \( F_{\text{tr}} \) can be expressed as: \[ v_i = k_i*X=\sum_{j = 1}^{C'}k_j^i*x_j \] where \( k_j^i \) is the 2D spatial kernel of the \( i \) - th convolution kernel, and \( v_i \) is the \( i \) - th channel of the output feature map \( V \). - The transformation of TSB can be expressed as: \[ U = F_{\text{TSB}}(V) \] where \( U=[u_1, u_2,\ldots, u_C] \) is the output feature map of the TSB block. - For the feature set \( V = [v_1, v_2,\ldots, v_C] \), divide it into smaller groups \( V=[g_1, g_2,\ldots, g_N] \), each group contains \( C_{\text{TSB}} \) features \( v_i \), then the number of groups \( N \) is determined by the following formula: \[ N=\left\lceil\frac{C}{C_{\text{TSB}}}\right\rceil \] - The transformation formula of TSB is: \[ g'_n = g_n*W_{\text{block}} \] where \( W_{\text{block}} \) is the shared block weight, and \( g'_n \) is the transformed feature group. Through these improvements, the TSB method provides an efficient and robust solution for DNN deployment on NVCIM accelerators.

TSB: Tiny Shared Block for Efficient DNN Deployment on NVCIM Accelerators

An Emerging NVM CIM Accelerator with Shared-Path Transpose Read and Bit-Interleaving Weight Storage for Efficient On-Chip Training in Edge Devices

Benchmark of the Compute-in-Memory-Based DNN Accelerator With Area Constraint

A fine-grained mixed precision DNN accelerator using a two-stage big-little core RISC-V MCU.

Accelerating Deep Neural Networks by Combining Block-Circulant Matrices and Low-Precision Weights

Computing-In-Memory Neural Network Accelerators for Safety-Critical Systems: Can Small Device Variations Be Disastrous?

EF-CIM: an Endurance Friendly CIM Accelerator Using Embedded NVM with Bit-Aware Wear Leveling for Efficient Light-Weight On-Chip Training in Edge Devices

Weight Block Sparsity: Training, Compilation, and AI Engine Accelerators

DNN+NeuroSim V2.0: An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators for On-chip Training

StoX-Net: Stochastic Processing of Partial Sums for Efficient In-Memory Computing DNN Accelerators

TensorCIM: Digital Computing-in-Memory Tensor Processor with Multichip-Module-Based Architecture for Beyond-NN Acceleration

A Convolutional Spiking Neural Network Accelerator with the Sparsity-Aware Memory and Compressed Weights

A Heterogeneous Microprocessor for Intermittent AI Inference Using Nonvolatile-SRAM-based Compute-In-Memory

Cambricon-M: A Fibonacci-Coded Charge-Domain SRAM-Based CIM Accelerator for DNN Inference

A Small-Footprint Accelerator for Large-Scale Neural Networks

Weight and Multiply-Accumulation Sparsity-Aware Non-Volatile Computing-in-Memory System

Negative Feedback Training: A Novel Concept to Improve Robustness of NVCIM DNN Accelerators

A Highly Configurable Hardware/Software Stack for DNN Inference Acceleration

Design of Computing-in-Memory (CIM) with Vertical Split-Gate Flash Memory for Deep Neural Network (DNN) Inference Accelerator

Bulk-Switching Memristor-Based Compute-In-Memory Module for Deep Neural Network Training

Voxel-CIM: An Efficient Compute-in-Memory Accelerator for Voxel-based Point Cloud Neural Networks