A 28-nm Floating-Point Computing-in-Memory Processor Using Intensive-CIM Sparse-Digital Architecture
Shengzhe Yan,Jinshan Yue,Chaojie He,Zi Wang,Zhaori Cong,Yifan He,Mufeng Zhou,Wenyu Sun,Xueqing Li,Chunmeng Dou,Feng Zhang,Huazhong Yang,Yongpan Liu,Ming Liu
DOI: https://doi.org/10.1109/jssc.2024.3363871
IF: 5.4
2024-01-01
IEEE Journal of Solid-State Circuits
Abstract:Computing-in-memory (CIM) chips have demonstrated promising high energy efficiency on multiply–accumulate (MAC) operations for artificial intelligence (AI) applications. Though integral (INT) CIM chips are emerging, the floating-point (FP) CIM chip has not been well explored. The high-accuracy demand of larger models and complex tasks requires FP computation. Besides, most of the neural network (NN) training tasks still rely on FP computation. This work presents an energy-efficient FP CIM processor. It is observed that most of the exponent values of FP data are concentrated in a small region. Therefore, the FP computations are divided into intensive and sparse parts and then executed on an intensive-CIM sparse-digital architecture. First, an FP-to-INT CIM workflow for the intensive FP operations is designed to reduce the CIM execution cycles. Second, a flexible sparse-digital core is proposed for the remaining sparse FP operations. Utilizing both the intensive-CIM and sparse-digital cores, this work can achieve both high energy efficiency and identical accuracy to the FP algorithm baseline. Considering the FP CIM execution flow, a CIM-friendly low-bit FP training method is proposed to further reduce the execution cycles. Besides, a low-MAC-value (MACV) CIM macro is designed to utilize the more random sparsity brought by FP alignment. The 28-nm fabricated chip shows 275–1615-TOPS/W@INT4 and 17.2–91.3-TOPS/W@FP16 macro energy efficiency from dense to the average sparsity on the tested models.
engineering, electrical & electronic