Abstract:Huge computational costs brought by convolution and batch normalization (BN) have caused great challenges for the online training and corresponding applications of deep neural networks (DNNs), especially in resource-limited devices. Existing works only focus on the convolution or BN acceleration and no solution can alleviate both problems with satisfactory performance. Online training has gradually become a trend in resource-limited devices like mobile phones while there is still no complete technical scheme with acceptable model performance, processing speed, and computational cost. In this research, an efficient online-training quantization framework termed EOQ for abbreviation is proposed by combining Fixup initialization and a novel quantization scheme for the online training in resource-limited devices. Based on the proposed framework, we have successfully realized full 8-bit integer network training and removed BN in large-scale DNNs. Especially, weight updates are quantized to 8-bit integers for the first time. Theoretical analyses of EOQ utilizing Fixup initialization for removing BN have been further given using a novel Block Dynamical Isometry theory with weaker assumptions. Benefiting from rational quantization strategies and the absence of BN, the full 8-bit networks based on EOQ can achieve state-of-the-art accuracy and immense advantages in computational cost and processing speed. Experiments show that the 8-bit EOQ networks achieve 2.78%, 3.85%, and 4.31% accuracy improvements compared with existing full 8-bit integer networks in ResNet-18/34/50. At the same time, the 8-bit EOQ networks can improve the computing speed greatly, and decrease the power consumption and circuit area by about an order of magnitude compared with 32-bit floating-point vanilla networks. In addition to the huge advantages brought by quantization in convolution operations, 8-bit networks based on EOQ without BN can realize >66× lower in power, >13× faster in the processing speed compared with the traditional 32-bit floating-point BN in the inference process. What’s more, the design of deep learning chips can be profoundly simplified in the absence of unfriendly square root operations in BN. Beyond this, EOQ has been evidenced to be more advantageous in small-batch online training with fewer batch samples. In summary, the EOQ framework is specially designed for reducing the high cost of convolution and BN in network training, demonstrating a broad application prospect of online training in resource-limited devices.

Training High-Performance and Large-Scale Deep Neural Networks with Full 8-Bit Integers.

Pse: Mixed Quantization Framework of Neural Networks for Efficient Deployment

Towards efficient full 8-bit integer DNN online training on resource-limited devices without batch normalization

Towards Accurate and Efficient Sub-8-Bit Integer Training

Gradient Distribution-aware INT8 Training for Neural Networks

Bit Efficient Quantization for Deep Neural Networks

A 4-Bit Integer-Only Neural Network Quantization Method Based on Shift Batch Normalization

Integer-Only Neural Network Quantization Scheme Based on Shift-Batch-Normalization

Novel adaptive quantization methodology for 8-bit floating-point DNN training

Quantization Networks

Training Deep Neural Networks with 8-bit Floating Point Numbers

Bit-shrinking: Limiting Instantaneous Sharpness for Improving Post-training Quantization

VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference

Optimal Quantization for Batch Normalization in Neural Network Deployments and Beyond

HAWQV3: Dyadic Neural Network Quantization

Training and Inference with Integers in Deep Neural Networks

Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks

Class-based Quantization for Neural Networks

Residual Quantization for Low Bit-Width Neural Networks

Direct Quantization for Training Highly Accurate Low Bit-width Deep Neural Networks

Training and inference for integer-based semantic segmentation network