LKBQ: Pushing the Limit of Post-Training Quantization to Extreme 1 Bit

Tianxiang Li,Bin Chen,Qian-Wei Wang,Yujun Huang,Shu-Tao Xia
DOI: https://doi.org/10.1109/icip49359.2023.10222555
2023-01-01
Abstract:Recent advances have shown the potential for post-training quantization (PTQ) to reduce excessive hardware resources and quantize deep models to low bits in a short time, compared with Quantization-Aware Training (QAT). However, existing PTQ approaches lose a lot of accuracies when quantizing the model to extremely low bits, e.g., 1 bit. In this work, we propose layer-by-layer self-knowledge distillation binary post-training quantization (LKBQ), the first method capable of quantizing the weights of neural networks to 1 bit in PTQ domain. We show that careful use of layer-by-layer self-distillation within the LKBQ can provide a significant performance boost. Furthermore, our evaluation results show that the initialization of quantized network weights can have a huge impact on the results. Then we propose three methods for weight initialization. Finally, in light of the characteristics of the binarized network, we propose a method named gradient scaling to further improve efficiency. Our experiments show that LKBQ pushes the limit of PTQ to extreme 1-bit for the first time.
What problem does this paper attempt to address?