COMB-MCM: Computing-on-Memory-Boundary NN Processor with Bipolar Bitwise Sparsity Optimization for Scalable Multi-Chiplet-Module Edge Machine Learning.
Haozhe Zhu,Bo Jiao,Jinshan Zhang,Xinru Jia,Yunzhengmao Wang,Tianchan Guan,Shengcheng Wang,Dimin Niu,Hongzhong Zheng,Chixiao Chen,Mingyu Wang,Lihua Zhang,Xiaoyang Zeng,Qi Liu,Yuan Xie,Ming Liu
DOI: https://doi.org/10.1109/ISSCC42614.2022.9731657
2022-01-01
Abstract:Recently, computing-in-memory (CIM) macros, originally designed to reduce the intensive memory accesses of Al tasks, have been employed in low-power machine learning SoCs due to their ultra-high computing efficiency [1]–[3]. These CIM macros still access weight data through on/off-chip memories, similar to processing elements in near-memory-computing architectures. The implementation poses challenges when counting the overall SoC energy efficiency (Fig. 15.3.1). First, the memory wall issue is unsolved. The weight updates affect overall system performance when large networks are deployed and massive off-chip weight data transfer occurs. Even for tiny machine learning tasks, power consumption and latency of constant weight updates cannot be neglected, because MAC computing efficiency is optimized and closely matches the efficiency of on-chip memory access (2pJ/b vs. 1pJ/b). Second, the viability of structured and coarse-grained sparsity optimization is highly algorithm dependent and requires explicit zero-detection blocks. Power optimization schemes for fine-grained or even arbitrary-sparsity patterns are lacking. Third, edge machine learning chips are cost sensitive. The conventional monolithic SoC design strategy, fabricating one specific SoC for each application, is not affordable in terms of NRE costs.