CIM2PQ: an Array-Wise and Hardware-Friendly Mixed Precision Quantization Method for Analog Computing-In-Memory
Sifan Sun,Jinyu Bai,Zhaoyu Shi,Weisheng Zhao,Wang Kang
DOI: https://doi.org/10.1109/tcad.2024.3358609
IF: 2.9
2024-01-01
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Abstract:Computing-in-memory (CIM) architecture is a promising convolutional neural network (CNN) accelerator known for its highly efficient matrix-vector multiplications (MVMs). However, due to the low-precision computation and limited size of CIM memory arrays, it is necessary to decompose the huge MVMs into smaller subsets. Conventional NN quantization methods overlook the characteristics of CIM hardware, resulting in diminished system performance and efficiency. This paper proposes a mixed precision quantization (MPQ) method based on evolutionary algorithm for CIM-based accelerators, while considering the hardware characteristics of CIM, called CIMPQ, which can automatically generate quantization strategies for NN model to improve the efficiency of CIM systems. Firstly, inspired by the CIM computing paradigm, an array-wise quantization granularity is introduced in the MPQ search space, which can jointly quantize the inputs, weights, and partial sums. Secondly, a production procedure containing fine-grained crossover and progressive adaptive mutation is proposed, which can efficiently explore the search space and speed up the search process. Thirdly, we propose a fast and efficient strategy evaluation method to obtain the performance of quantization strategy on the CIM platform, saving the evaluation time significantly without requiring fine-tuning. Finally, to protect CIM-friendly strategies with lower bit-widths but worse algorithm performance, we propose a strategy selection method based on multi-objective optimization, named qNSGA-III. The effectiveness of the proposed method has been demonstrated through experimental results of various NNs and datasets. For ResNet-18, the hardware efficiency and accuracy can be improved to 117% with 7.05%, 113% with 3.37%, and 119% with 5.78%, on CIFAR-10, CIFAR-100 and ImageNet, respectively, compared to the baseline MPQ method.