MixMixQ: Quantization with Mixed Bit-Sparsity and Mixed Bit-Width for CIM Accelerators

Jinyu Bai,He Zhang,LongChao Liu,Pengfei Li,Wang Kang
DOI: https://doi.org/10.1145/3649476.3658809
2024-01-01
Abstract:Quantization is vital for deploying neural networks on Computing-In-Memory (CIM) based accelerators due to inherent limitations in memory devices and data interfaces’ representational capacities. However, traditional quantization algorithms often overlook CIM’s unique computing paradigm, leading to suboptimal performance. To address this, we introduce MixMixQ, a novel quantization algorithm specifically designed for CIM accelerators that strategically integrates mixed bit-sparsity and mixed bit-width, enhancing overall hardware efficiency while preserving high accuracy. Notably, our method can enhance hardware efficiency by up to 294% compared to traditional quantization methods, with only a minimal 0.13% decrease in accuracy compared to a full-precision network.
What problem does this paper attempt to address?