SUN: Dynamic Hybrid-Precision SRAM-Based CIM Accelerator With High Macro Utilization Using Structured Pruning Mixed-Precision Networks

Yen-Wen Chen,Rui-Hsuan Wang,Yu-Hsiang Cheng,Chih-Cheng Lu,Meng-Fan Chang,Kea-Tiong Tang
DOI: https://doi.org/10.1109/tcad.2024.3358583
IF: 2.9
2024-01-01
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Abstract:Convolutional neural networks (CNNs) play a key role in many deep learning applications; however, these networks are resource-intensive. The parallel computing ability of computing-in-memory (CIM) enables high energy efficiency in artificial intelligence accelerators. When implementing a CNN in CIM, quantization and pruning are indispensable for reducing the calculation complexity and improving the efficiency of hardware calculations. Mixed-precision quantization with flexible bit widths provides a better efficiency-accuracy trade-off than fixed-precision quantization. However, CIM calculations for mixed-precision models are inefficient because the fixed capacity of CIM macros is redundant for hybrid precision distributions. To address this, we propose a software and hardware co-design SRAM-based CIM architecture called SUN, including a CIM-adaptive mixed precision joint pruning quantization algorithm and dynamic hybrid precision CNN accelerator. Three techniques are implemented in this architecture: (1) a mixed precision joint pruning algorithm for reducing the memory access and removing the redundant computing, (2) a CIM-adaptive filter-wise and paired mixed-precision quantization for improving CIM macro utilization, and (3) an SRAM-based CIM CNN accelerator in which the SRAM CIM macro is used as the processing element to support sparse and mixed-precision CNN computation with high CIM macro utilization. This architecture achieves a system area efficiency of 428.2 TOPS/mm2 and throughput of 792.2 GOPS on the CIFAR-10 dataset.
engineering, electrical & electronic,computer science, interdisciplinary applications, hardware & architecture
What problem does this paper attempt to address?