34.3 A 22nm 64kb Lightning-Like Hybrid Computing-in-Memory Macro with a Compressed Adder Tree and Analog-Storage Quantizers for Transformer and CNNs.
An Guo,Xi Chen,Fangyuan Dong,Jinwu Chen,Zhihang Yuan,Xing Hu,Yuanpeng Zhang,Jingmin Zhang,Yuchen Tang,Zhican Zhang,Gang Chen,Dawei Yang,Zhaoyang Zhang,Lizheng Ren,Tianzhu Xiong,Bo Wang,Bo Liu,Weiwei Shan,Xinning Liu,Hao Cai,Guangyu Sun,Jun Yang,Xin Si
DOI: https://doi.org/10.1109/ISSCC49657.2024.10454278
2024-01-01
Abstract:SRAM-based computing-in-memory (CIM) has made significant progress in improving the energy efficiency (EF) of neural operators, specifically MAC, used in AI applications. Prior CIM methods have demonstrated attractive energy efficiencies (EF) under a fixed/less of accumulation length, sparsity, toggle rate, and bit precision [1] –[6]. Analog CIMs (ACIM) offer potentially higher EF but are susceptible to PVT variations. On the other hand, digital CIMs (DCIM) are robust but provide moderate energy efficiency. In prior weight-wise-cut structures [1], when processing INT8 MUL operations, a calculation error occurred as the low-place-value weight directly influences the results, which leads to accuracy loss. In prior vertical-cut structures [2], more digital components are used to ensure inference accuracy; however, 64 local-computing cells are required for each bank, which results in a large area overhead and power consumption. As depicted in Fig. 34.3.1, challenges arise when applying these approaches to higher bitprecision MAC operations: (1) in balancing tradeoffs between inference accuracy and the area/energy overhead of hybrid analog-digital CIM; (2) the significant energy consumption and error accumulation of the readout circuit in ACIM; and (3) the limited EF and DCIM performance due to the large-scale adder tree.