Toggle Rate Aware Quantization Model Based on Digital Floating-Point Computing-in-Memory Architecture

Xi Chen,Yitong Zhao,An Guo,Jinwu Chen,Fangyuan Dong,Zhaoyang Zhang,Tianzhu Xiong,Bo Wang,Yuyao Kong,Xin Si
DOI: https://doi.org/10.1109/tcsii.2024.3354313
2024-01-01
Abstract:Computing-in-memory (CIM) has been proven to achieve high energy efficiency and significant acceleration effects on neural networks with high computational parallelism. Based on typical integer CIMs, some floating-point CIMs (FP-CIM) are proposed recently to execute more accuracy-demanding tasks such as training and high-precision inference. However, prior research has not adequately explored the relationship between circuit design within the FP-CIM architecture and hardware/software metrics. Furthermore, in digital circuits, the data toggle rate significantly affect hardware performance. In this brief, a toggle rate-aware quantization model is proposed to define and explore the design space of FP-CIM. Based on the experimental results, some key considerations on FP-CIM design are derived. With the toggle rate reduction scheme, toggle rate can be reduced by 28%, resulting in a remarkable 1.18x improvement in energy efficiency with only a 0.35% accuracy loss. To validate our model, a 28nm digital FP-CIM test chip is fabricated which achieves energy efficiency of 32.28 TFLOPS/W and inference accuracy of 76.14% on DenseNet161 and ImageNet dataset.
engineering, electrical & electronic
What problem does this paper attempt to address?