TSB: Tiny Shared Block for Efficient DNN Deployment on NVCIM Accelerators

Yifan Qin,Zheyu Yan,Zixuan Pan,Wujie Wen,Xiaobo Sharon Hu,Yiyu Shi
2024-08-22
Abstract:Compute-in-memory (CIM) accelerators using non-volatile memory (NVM) devices offer promising solutions for energy-efficient and low-latency Deep Neural Network (DNN) inference execution. However, practical deployment is often hindered by the challenge of dealing with the massive amount of model weight parameters impacted by the inherent device variations within non-volatile computing-in-memory (NVCIM) accelerators. This issue significantly offsets their advantages by increasing training overhead, the time and energy needed for mapping weights to device states, and diminishing inference accuracy. To mitigate these challenges, we propose the "Tiny Shared Block (TSB)" method, which integrates a small shared 1x1 convolution block into the DNN architecture. This block is designed to stabilize feature processing across the network, effectively reducing the impact of device variation. Extensive experimental results show that TSB achieves over 20x inference accuracy gap improvement, over 5x training speedup, and weights-to-device mapping cost reduction while requiring less than 0.4% of the original weights to be write-verified during programming, when compared with state-of-the-art baseline solutions. Our approach provides a practical and efficient solution for deploying robust DNN models on NVCIM accelerators, making it a valuable contribution to the field of energy-efficient AI hardware.
Hardware Architecture,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the decline in inference accuracy and low training efficiency caused by device variation when deploying deep neural network (DNN) models on non - volatile in - memory computing (NVCIM) accelerators. Specifically: 1. **Impact of device variation**: The inherent variation characteristics of non - volatile memory (NVM) devices (such as inter - cycle and inter - device variations) can lead to imprecise weight representation, thereby reducing the inference accuracy of DNN on NVCIM accelerators. 2. **Limitations of existing solutions**: - **Accuracy**: Existing noise - injection training methods are difficult to ensure high precision for each weight because the parameter space is huge and each parameter changes independently. - **Training efficiency**: These methods increase additional computational overhead, resulting in a significant increase in training time. - **Deployment complexity**: Programming a single weight to a device requires multiple read - write operations, increasing deployment time and energy consumption, especially in large - scale models. To solve these problems, the authors propose a new method called "Tiny Shared Block (TSB)". TSB aims to stabilize feature processing and reduce the impact of device variation by integrating a small shared 1×1 convolution block into the DNN architecture. Specifically: - **Improve inference accuracy**: TSB can significantly improve inference accuracy, more than 20 times better than the best existing baseline method. - **Accelerate the training process**: The TSB method can increase the training speed by more than 5 times. - **Reduce weight - mapping cost**: TSB only needs to perform write - verification on less than 0.4% of the original weights, greatly reducing programming time and operational complexity. - **Compatible with existing architectures**: The TSB method is fully compatible with existing accelerator architectures without the need to add new functional circuits. ### Formula summary - Let \( X\in\mathbb{R}^{H'\times W'\times C'} \) and \( V\in\mathbb{R}^{H\times W\times C} \) be the input and output feature maps respectively, then the convolution transformation \( F_{\text{tr}} \) can be expressed as: \[ v_i = k_i*X=\sum_{j = 1}^{C'}k_j^i*x_j \] where \( k_j^i \) is the 2D spatial kernel of the \( i \) - th convolution kernel, and \( v_i \) is the \( i \) - th channel of the output feature map \( V \). - The transformation of TSB can be expressed as: \[ U = F_{\text{TSB}}(V) \] where \( U=[u_1, u_2,\ldots, u_C] \) is the output feature map of the TSB block. - For the feature set \( V = [v_1, v_2,\ldots, v_C] \), divide it into smaller groups \( V=[g_1, g_2,\ldots, g_N] \), each group contains \( C_{\text{TSB}} \) features \( v_i \), then the number of groups \( N \) is determined by the following formula: \[ N=\left\lceil\frac{C}{C_{\text{TSB}}}\right\rceil \] - The transformation formula of TSB is: \[ g'_n = g_n*W_{\text{block}} \] where \( W_{\text{block}} \) is the shared block weight, and \( g'_n \) is the transformed feature group. Through these improvements, the TSB method provides an efficient and robust solution for DNN deployment on NVCIM accelerators.