A 19.7 TFLOPS/W Multiply-less Logarithmic Floating-Point CIM Architecture with Error-Reduced Compensated Approximate Adder

Mengjie Li,Hongyi Zhang,Siqi He,Haozhe Zhu,Hao Zhang,Jinglei Liu,Jiayuan Chen,Zhenping Hu,Xiaoyang Zeng,Chixiao Chen
DOI: https://doi.org/10.1109/iscas58744.2024.10558433
2024-01-01
Abstract:The growing demand for high-precision neural network training and inference has driven the necessity for floating-point (FP) compute-in-memory (CIM) architectures. However, compared to the extensively studied INT-CIM, the energy efficiency of FP-CIM still requires further optimization and enhancement. This work presents an energy-efficient multiply-less digital SRAM-based FP-CIM architecture. Specifically, to improve the energy efficiency and minimize the area requirement, we propose to employ logarithmic approximate FP multiplication (LAM) within the FP-CIM architecture. The LAM approximates FP multiplication by converting it into a straightforward addition operation, thereby reducing the power consumption and area. Additionally, we propose an approximate adder with error-reduced compensation to address critical path delay issues associated with carry propagation, further minimizing power consumption and area overhead. A 24Kb SRAM CIM macro with the proposed techniques is designed in a 28nm CMOS technology and occupies an area of 0.033 mm 2 . The simulation results show that our work achieves an energy efficiency of 19.7 TFLOPS/W with bfloat16 representation at 0.9V and 200MHz.
What problem does this paper attempt to address?