A 28 Nm 0.25-0.61 Mw 31-60Fps Versatile SoC for Diverse Extreme Edge ML Workloads with Flexible Hetero-Fabric Dataflow Orchestration and Compute/Storage-Density-Adjustable CIM
Yaolei Li,Wenbin Jia,Xinyuan Lin,Songming Yu,Yifan He,Wenxun Wang,Xiang Li,Lu Zhang,Yixuan Xie,Junyan Lin,Huazhong Yang,Hongyang Jia,Jinshan Yue,Yongpan Liu
DOI: https://doi.org/10.1109/esserc62670.2024.10719427
2024-01-01
Abstract:This work presents an ultra-low power versatile SoC for diverse extreme edge ML workloads. It has four key features: 1) Hetero-fabric and flexible dataflow orchestration co-design to achieve high utilization of the compute-in-memory (CIM) core and digital core (Dcore) with reduced L1 memory access. 2) Performance-balance-aware DVFS enabling layer-wise adjustable clock frequency ratio of CIM and Dcore for further power reduction. 3) A reconfigurable-LUT-based CIM macro with dynamic adjustable compute and storage density to fit the varying layers of neural network models. 4) A multi-operator-fused unified Dcore, where various operators can all be uniformly implemented on the multiply-accumulate-based reconfigurable PEs. The fabricated 28nm chip achieves $7 \times$ higher SoC energy efficiency compared with the state-of-the-art tinyML SoC. It supports diverse extreme edge applications and various tinyML models, achieving practical performance (e.g., no less than 30fps) with no more than 0.5 mW power on all tasks of MLPerf Tiny while meeting the quality target.
What problem does this paper attempt to address?