A 28nm 8928Kb/mm 2 -Weight-Density Hybrid SRAM/ROM Compute-in-Memory Architecture Reducing >95% Weight Loading from DRAM.
Guodong Yin,Yiming Chen,Mingyen Lee,Xirui Du,Yue Ke,Wenjun Tang,Zhonghao Chen,Mufeng Zhou,Jinshan Yue,Huazhong Yang,Hongyang Jia,Yongpan Liu,Xueqing Li
DOI: https://doi.org/10.1109/CICC60959.2024.10528966
2024-01-01
Abstract:Large transformer networks have demonstrated remarkable advancements in various AI tasks, However, the explosive growth of parameter causes severe challenges for AI accelerators because of the huge amount of data movement. Compute-in-memory (CiM) has thus been proposed as a competitive approach to the reduction of data movement [2–6]. However, three main challenges limit the energy efficiency of CiM. Firstly, the limited on-chip memory capacity severely affects the task-level efficiency due to frequent weight reload from DRAM. This challenge can be addressed from the insight that large pre-trained models can be adapted to various downstream tasks with most of the weights unchanged. Therefore, a hybrid “ultra-dense-ROM + flexible-SRAM” CiM structure can lead to a significant reduction in off-chip DRAM access. Secondly, ADCs with lower resolution can significantly reduce the overhead but will cause pernicious impact to the accuracy. The proposed adaptive-resolution ADC, which accumulates 2b updates onto a 5b partial sum, can reduce the conversion overhead while ensuring high accuracy. Thirdly, the energy efficiency of charge-domain computing is limited due to large computing capacitors that are difficult to scale down under variations. This work reduces the computing capacitors substantially with post-fabrication 1-of-N capacitor selection (PFCS).