Abstract:The nonuniform quantization strategy for compressing neural networks usually achieves better performance than its counterpart, i.e., uniform strategy, due to its superior representational capacity. However, many nonuniform quantization methods overlook the complicated projection process in implementing the nonuniformly quantized weights/activations, which incurs non-negligible time and space overhead in hardware deployment. In this study, we propose Nonuniform-to-Uniform Quantization (N2UQ), a method that can maintain the strong representation ability of nonuniform methods while being hardware-friendly and efficient as the uniform quantization for model inference. We achieve this through learning the flexible in-equidistant input thresholds to better fit the underlying distribution while quantizing these real-valued inputs into equidistant output levels. To train the quantized network with learnable input thresholds, we introduce a generalized straight-through estimator (G-STE) for intractable backward derivative calculation w.r.t. threshold parameters. Additionally, we consider entropy preserving regularization to further reduce information loss in weight quantization. Even under this adverse constraint of imposing uniformly quantized weights and activations, our N2UQ outperforms state-of-the-art nonuniform quantization methods by 0.5~1.7 on ImageNet, demonstrating the contribution of N2UQ design. Code and models are available at: <a class="link-external link-https" href="https://github.com/liuzechun/Nonuniform-to-Uniform-Quantization" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the trade - off between the hardware implementation efficiency and the model performance in the quantization methods in neural network compression. Specifically: 1. **Hardware implementation efficiency**: Nonuniform Quantization can usually achieve better performance than Uniform Quantization, but its complex projection process leads to significant time and space overheads during hardware deployment. For example, the output of non - uniform quantization is usually floating - point numbers, and these floating - point numbers need to be mapped to binary bits through Look - Up Tables (LUTs) to accelerate multiplication operations, which increases the hardware area and energy consumption. 2. **Model performance**: Although uniform quantization is more hardware - friendly, because its quantization levels with fixed intervals cannot well adapt to different distributions of input values, resulting in large quantization errors and affecting the model's accuracy. Especially in the case of low - bit quantization (such as 2 - bit), the model performance has a significant decline compared with the full - precision model. In order to improve both the hardware implementation efficiency and the model performance simultaneously, this paper proposes the **Nonuniform - to - Uniform Quantization (N2UQ)** method. The main goals of N2UQ are: - **Maintaining hardware - friendliness**: By outputting uniformly - spaced quantized values, the quantized weights and activations can directly perform efficient bitwise operations, avoiding additional post - processing steps. - **Improving quantization accuracy**: By learning the input thresholds to better fit the underlying data distribution, thereby reducing quantization errors and improving model performance. ### Main contributions 1. **Proposing N2UQ**: A new quantization method that improves quantization accuracy by learning input thresholds while maintaining hardware - friendliness similar to uniform quantization. 2. **Introducing the Generalized Straight - Through Estimator (G - STE)**: Solves the difficult problem of gradient calculation regarding input threshold parameters in the quantization process. G - STE can automatically adjust the thresholds and provide a more fine - grained approximation of the quantization function. 3. **Proposing weight regularization**: Based on entropy analysis, a new weight regularization method is proposed, which further reduces the information loss during the quantization process. 4. **Experimental verification**: Extensive experiments were carried out on the ImageNet dataset, and the results show that N2UQ significantly improves the model's accuracy under different architectures and bit - width constraints. In particular, on the 2 - bit ResNet - 50 model, it reaches a top - 1 accuracy of 76.4%, only 0.6% lower than the full - precision model, demonstrating the effectiveness of the N2UQ design. ### Formula representation - **Quantization output**: \[ x_q=\begin{cases} 0 & \text{if } x_r < T_1\\ 1 & \text{if } T_1\leq x_r < T_2\\ \vdots & \vdots\\ 2^{n - 1} & \text{if } x_r\geq T_{2^{n - 1}} \end{cases} \] - **Back - propagation of G - STE**: \[ \frac{\partial x_q}{\partial x_r}=E\left[\frac{\partial\tilde{x}_q}{\partial x_r}\right]=\frac{\partial}{\partial x_r}E[\tilde{x}_q]=\begin{cases} \frac{\partial}{\partial x_r}\left(\frac{x_r - d_{i - 1}}{a_i + i - 1}\right) & \text{if } d_{i - 1}\leq x_r < d_i\\ 0 & \text{otherwise} \end{cases} \] - **Weight regularization**: \[ \max H =-\sum_{i = 1}^N p_i\log(p_i)

Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation

Optimizing Quantized Neural Networks in a Weak Curvature Manifold

Propagating Asymptotic-Estimated Gradients for Low Bitwidth Quantized Neural Networks

Hessian-based Mixed-Precision Quantization with Transition Aware Training for Neural Networks

Pse: Mixed Quantization Framework of Neural Networks for Efficient Deployment

Cnq: Compressor-Based Non-Uniform Quantization Of Deep Neural Networksinspec Keywordsother Keywordskey Words

Fast Non-Uniform Quantization of Neural Networks

CNQ: Compressor‐Based Non‐uniform Quantization of Deep Neural Networks

AUSN: Approximately Uniform Quantization by Adaptively Superimposing Non-uniform Distribution for Deep Neural Networks

LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks

Quantization Networks

Instance-Aware Dynamic Neural Network Quantization

EQ-Net: Elastic Quantization Neural Networks

I $^2$ NQ: Inter and Intra Nonuniform Quantization for Single Image Super-Resolution

Two-Step Quantization for Low-bit Neural Networks

Error-aware Quantization through Noise Tempering

Μl2q: an Ultra-Low Loss Quantization Method for DNN Compression

Adaptive Gradients and Weight Projection Based on Quantized Neural Networks for Efficient Image Classification

Learning Accurate Low-bit Quantization towards Efficient Computational Imaging

Post-Training Non-Uniform Quantization for Convolutional Neural Networks

Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights