ZEBRA: A Zero-Bit Robust-Accumulation Compute-In-Memory Approach for Neural Network Acceleration Utilizing Different Bitwise Patterns

Yiming Chen,Guodong Yin,Hongtao Zhong,Mingyen Lee,Huazhong Yang,Sumitha George,Vijaykrishnan Narayanan,Xueqing Li
DOI: https://doi.org/10.1109/asp-dac58780.2024.10473851
2024-01-01
Abstract:Deploying a lightweight quantized model in compute-in-memory (CIM) might result in significant accuracy degradation due to reduced signal-noise rate (SNR). To address this issue, this paper presents ZEBRA, a zero-bit robust-accumulation CIM approach, which utilizes bitwise zero patterns to compress computation with ultra-high resilience against noise due to circuit non-idealities, etc. First, ZEBRA provides a cross-level design that successfully exploits value-adaptive zero-bit patterns to improve the performance in robust 8-bit quantization dramatically. Second, ZEBRA presents a multi-level local computing unit circuit design to implement the bitwise sparsity pattern, which boosts the area/energy efficiency by 2x-4x compared with existing CIM works. Experiments demonstrate that ZEBRA can achieve <1.0% accuracy loss in CIFAR10/100 with typical noise, while conventional CIM works suffer from > 10% accuracy loss. Such robustness leads to much more stable accuracy for high-parallelism inference on large models in practice.
What problem does this paper attempt to address?