Abstract:The quantized neural networks (QNNs) can be useful for neural network acceleration and compression, but during the training process they pose a challenge: how to propagate the gradient of loss function through the graph flow with a derivative of 0 almost everywhere. In response to this non-differentiable situation, we propose a novel Asymptotic-Quantized Estimator (AQE) to estimate the gradient. In particular, during back-propagation, the graph that relates inputs to output remains smoothness and differentiability. At the end of training, the weights and activations have been quantized to low-precision because of the asymptotic behaviour of AQE. Meanwhile, we propose a M-bit Inputs and N-bit Weights Network (MINW-Net) trained by AQE, a quantized neural network with 1-3 bits weights and activations. In the inference phase, we can use XNOR or SHIFT operations instead of convolution operations to accelerate the MINW-Net. Our experiments on CIFAR datasets demonstrate that our AQE is well defined, and the QNNs with AQE perform better than that with Straight-Through Estimator (STE). For example, in the case of the same ConvNet that has 1-bit weights and activations, our MINW-Net with AQE can achieve a prediction accuracy 1.5% higher than the Binarized Neural Network (BNN) with STE. The MINW-Net, which is trained from scratch by AQE, can achieve comparable classification accuracy as 32-bit counterparts on CIFAR test sets. Extensive experimental results on ImageNet dataset show great superiority of the proposed AQE and our MINW-Net achieves comparable results with other state-of-the-art QNNs.

Efficient Quantization for Neural Networks with Binary Weights and Low Bitwidth Activations.

Propagating Asymptotic-Estimated Gradients for Low Bitwidth Quantized Neural Networks

Residual Quantization for Low Bit-Width Neural Networks.

Weighted-Entropy-Based Quantization for Deep Neural Networks

Quantization Networks

Instance-Aware Dynamic Neural Network Quantization

A Novel Low-Bit Quantization Strategy for Compressing Deep Neural Networks

Activations Quantization for Compact Neural Networks

Training Compact Neural Networks with Binary Weights and Low Precision Activations

Searching for Low-Bit Weights in Quantized Neural Networks

Alternating Multi-bit Quantization for Recurrent Neural Networks

Learning Bilateral Clipping Parametric Activation for Low-Bit Neural Networks

Effective Quantization Methods for Recurrent Neural Networks

Direct Quantization for Training Highly Accurate Low Bit-width Deep Neural Networks

LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks

Dataflow-based Joint Quantization of Weights and Activations for Deep Neural Networks

Neural Network Activation Quantization with Bitwise Information Bottlenecks

General Bitwidth Assignment for Efficient Deep Convolutional Neural Network Quantization

Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations

Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks