Abstract:To accelerate and compress deep neural networks (DNNs), many network quantization algorithms have been proposed. Although the quantization strategy of any algorithm from the state-of-the-arts may outperform others in some network architectures, it is hard to prove the strategy is always better than others, and even cannot judge that the strategy is always the best choice for all layers in a network. In other words, existing quantization algorithms are suboptimal as they ignore the different characteristics of different layers and quantize all layers by a uniform quantization strategy. To solve the issue, in this paper, we propose a differentiable quantization strategy search (DQSS) to assign optimal quantization strategy for individual layer by taking advantages of the benefits of different quantization algorithms. Specifically, we formulate DQSS as a differentiable neural architecture search problem and adopt an efficient convolution to efficiently explore the mixed quantization strategies from a global perspective by gradient-based optimization. We conduct DQSS for post-training quantization to enable their performance to be comparable with that in full precision models. We also employ DQSS in quantization-aware training for further validating the effectiveness of DQSS. To circumvent the expensive optimization cost when employing DQSS in quantization-aware training, we update the hyper-parameters and the network parameters in a single forward-backward pass. Besides, we adjust the optimization process to avoid the potential under-fitting problem. Comprehensive experiments on high level computer vision task, i.e., image classification, and low level computer vision task, i.e., image super-resolution, with various network architectures show that DQSS could outperform the state-of-the-arts.

Optimization of ReLU Neural Networks using Quotient Stochastic Gradient Descent

Optimizing Quantized Neural Networks in a Weak Curvature Manifold

G-SGD: Optimizing ReLU Neural Networks in Its Positively Scale-Invariant Space.

Propagating Asymptotic-Estimated Gradients for Low Bitwidth Quantized Neural Networks

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Optimizing Neural Networks in the Equivalent Class Space.

Three Quantization Regimes for ReLU Networks

Optimal function approximation with ReLU neural networks

Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes

Compelling ReLU Networks to Exhibit Exponentially Many Linear Regions at Initialization and During Training

Post-Training Quantization for Re-parameterization via Coarse & Fine Weight Splitting

Neural networks with ReLU powers need less depth

Convergence Analysis of Two-layer Neural Networks with ReLU Activation

Improving Convolutional Neural Network Using Pseudo Derivative ReLU.

Differentiable Search for Finding Optimal Quantization Strategy

Understanding Multi-phase Optimization Dynamics and Rich Nonlinear Behaviors of ReLU Networks

Activation-Descent Regularization for Input Optimization of ReLU Networks

Convex Relaxations of ReLU Neural Networks Approximate Global Optima in Polynomial Time

Deep Neural Networks with ReLU-Sine-Exponential Activations Break Curse of Dimensionality in Approximation on Hölder Class.

Training a Two Layer ReLU Network Analytically