A 28-Nm 36 Kb SRAM CIM Engine with 0.173 $\mu $m$^{2}$ 4T1T Cell and Self-Load-0 Weight Update for AI Inference and Training Applications

Chenyang Zhao,Jinbei Fang,Xiaoli Huang,Deyang Chen,Zhiwang Guo,Jingwen Jiang,Jiawei Wang,Jianguo Yang,Jun Han,Peng Zhou,Xiaoyong Xue,Xiaoyang Zeng
DOI: https://doi.org/10.1109/jssc.2024.3399615
IF: 5.4
2024-01-01
IEEE Journal of Solid-State Circuits
Abstract:Computing-in-memory (CIM) promises high energy efficiency (EE) and performance in accelerating the feed-forward (FF) and back-propagation (BP) processes of deep neural networks (DNNs) with less data movement and high parallelism. However, challenges still lie in large memory cells, network mapping, and IR-drop variation to realize efficient CIM implementation. In this work, a 28-nm 36 Kb static random-access memory (SRAM) CIM engine with nondestructive-read (NDR) cell and weight update energy saving is used for multiply-accumulate (MAC) acceleration in artificial intelligence (AI) inference and train applications. A 4T1T SRAM bit-cell is proposed with NDR and records the smallest cell size of 0.173 mu m(2) . The power-on self-load-0 feature of the 4T1T cell saves the weight update energy and latency for writing 0. The shared-path dual-mode read (SPDMR) brings fewer circuit overheads to support both FF and BP paths. The bit-interleaving weight mapping (BIWM) speeds up the BP path without slowing FF. IR-drop-aware adaptive clampers (IRDAA-Cs) with hierarchical read word-lines (RWLs) and read bit-lines (RBLs) apply possibly accurate voltages on near/far cells. The engine achieves an EE of 263.1/412.1 TOPS/W, as well as an area efficiency (AE) of 2.5/4.9 TOPS mu m(2) for FF/BP process @1-bit weight/activation with 74.4%-78.3% reduction in weight update energy.
What problem does this paper attempt to address?