Partial Sum Quantization for Computing-In-Memory-Based Neural Network Accelerator

Jinyu Bai,Wenlu Xue,Yunqian Fan,Sifan Sun,Wang Kang
DOI: https://doi.org/10.1109/tcsii.2023.3246562
2023-01-01
IEEE Transactions on Circuits & Systems II Express Briefs
Abstract:Computing-in-memory (CIM) has been successful as an ideal hardware platform to improve the performance and efficiency of convolutional neural networks (CNNs). However, owing to the limited size of a memory array, the input and weight matrices of a convolution operation have to be split into sub-matrices, involving partial sums. Generally, high-resolution analog-to-digital converters (ADCs) are used to obtain partial sums for maintaining the computing precision, but at the cost of high area and energy. Partial sum quantization (PSQ), which can be exploited to significantly reduce the ADC’s resolution, is still an open question in this field. This brief proposes a novel PSQ approach for CIM using post-training quantization based on a newly defined array-wise granularity. Meanwhile, as the non-linearity of ADCs’ transfer function has a severe impact on the accuracy, a gradient estimation method based on smooth approximation is proposed to solve such a problem. Experiments on various CNNs show that the required ADCs’ resolution can be reduced from 11-bit to even 3-bit with slight accuracy loss (~1.63%), and the energy-efficiency is increased by up to 224%.
What problem does this paper attempt to address?