SRAM-Based Processing-In-Memory Design with Kullback-Leibler Divergence-Based Dynamic Precision Quantization.

Yanjun Li,Chunshan Zu,Bingqian Wang,Zhenhua Zhu,Yaojun Zhang,Ran Duan,Bing Li,Bonan Yan
DOI: https://doi.org/10.1145/3583781.3590306
2023-01-01
Abstract:Deep convolutional neural networks (CNNs) are widely used in Artificial Intelligence of Things (AIoT) systems. Limited by power and area, conventional edge devices are insufficient to handle the cost of CNN computation. The idea of SRAM based Processing-In-Memory (SRAM-PIM) has been advocated to implement CNN on edge devices because of its high area and power efficiency. To further excavate the potential of SRAM-PIM on edge inferences, this paper proposes an SRAM-PIM design with Kullback-Leibler (KL) divergence-based dynamic precision quantization. The proposed quantization method decouples the effect of different CNN layers on accuracy and introduces the SRAM-PIM hardware performance in quantization, realizing SRAM-PIM-aware layer-wise precision adjustment. The proposed SRAM-PIM design has been applied in image classification tasks on edge devices. Our evaluation shows that the implemented design achieves up to 2.03x energy efficiency improvement and 2.54% accuracy improvement compared with existing dynamic precision PIM design. Compared with existing reinforcement-learning-based dynamic quantization method that requires several hours quantization time, the proposed dynamic precision quantization method takes only 26.28us to get the optimal quantization results.
What problem does this paper attempt to address?