A 28nm 64-Kb 31.6-TFLOPS/W Digital-Domain Floating-Point-Computing-Unit and Double-Bit 6T-SRAM Computing-in-Memory Macro for Floating-Point CNNs
An Guo,Xin Si,Xi Chen,Fangyuan Dong,Xingyu Pu,Dongqi Li,Yongliang Zhou,Lizheng Ren,Yeyang Xue,Xueshan Dong,Hui Gao,Yiran Zhang,Jingmin Zhang,Yuyao Kong,Tianzhu Xiong,Bo Wang,Hao Cai,Weiwei Shan,Jun Yang
DOI: https://doi.org/10.1109/isscc42615.2023.10067260
2023-01-01
Abstract:SRAM-based computing-in-memory (SRAM-CIM) has been intensively studied and developed to improve the energy and area efficiency of AI devices. SRAM-CIMs have effectively implemented high integer (INT) precision multiply-and-accumulate (MAC) operations to improve the inference accuracy of various image classification tasks [1]–[3],[5],[6]. To realize more complex AI tasks, such as detection and segmentation, and to support on-chip training for better inference accuracy, floating-point MAC (FP-MAC) operations with high-energy efficiency are required. However, most SRAM-CIMs that previously used digital [5], [6] or analog [1]–[4] in-memory computing cannot effectively support FP-MACs: e.g., Brain Float16 (BF16) datatype. Since supporting high floating-point input (IN), weight (W) and output (OUT) precision for SRAM-CIM may cause (1) inconsistency between the shift-alignment of conventional digital FP-MACs and the structured mapping of most SRAM-CIMs, and (2) results in a more difficult tradeoff between throughput/memory size (T/S), energy efficiency (EF), and memory density (MD), as shown in Fig. 7.2.1.