A 1.97 TFLOPS/W Configurable SRAM-Based Floating-Point Computation-in-Memory Macro for Energy-Efficient AI Chips.

Yangzhan Mai,Mingyu Wang,Chuanghao Zhang,Baiqing Zhong,Zhiyi Yu
DOI: https://doi.org/10.1109/iscas46773.2023.10182197
2023-01-01
Abstract:Floating-point (FP) computation-in-memory (CIM) technology is increasingly demanded by low-power neural network training. In this work, we propose an energy-efficient configurable SRAM-based FP CIM macro. A mantissa parallel alignment method is proposed to improve calculation speed and accuracy in FP multiply-accumulation (MAC) operations. The separated mantissa CIM and exponent CIM are designed to enable pipelining of exponent and mantissa operations to increase computation throughput. Furthermore, the macro can be flexibly set to BF16 or FP32 precision by configuring accumulators. The proposed FP CIM macro is analyzed in 40 nm CMOS technology, and the estimated area is 0.48 mm 2 , The simulation results show that the macro achieves a frequency of 294 MHz in 1.1 V. In BF16 mode, the macro can achieve a peak throughput of 56.5 GFLOPS and an energy efficiency of 1.97 TFLOPS/W while the peak throughput and energy efficiency are 16 GFLOPS and 0.62 TFLOPS/W in FP32 mode.
What problem does this paper attempt to address?