Abstract:In the field of object detection, deep learning has greatly improved accuracy compared to previous algorithms and has been used widely in recent years. However, object detection using deep learning requires many hardware (HW) resources due to the huge computations for high performance, making it very difficult to run real-time on embedded platforms. Therefore, various compression methods have been studied to solve this problem. In particular, quantization methods greatly reduce the computational burden of deep learning by reducing the number of bits used for weights and activation functions in deep learning. However, most of these existing studies targeted only object classification and cannot be applied to object detection. Furthermore, most of the existing quantization studies are based on floating-point operations, which requires additional effort when implementing HW accelerators. This paper proposes an HW-friendly fixed-point-based quantization method that can also be applied to object detection. In the proposed method, the center of the weight distribution is adjusted to zero by subtracting the mean of weight parameters before quantization, and the retraining process is iteratively applied to minimize the accuracy drop caused by quantization. Furthermore, while applying the proposed method to object detection, performance degradation is minimized by considering the minimum and maximum values of weight parameters of deep learning networks. When applying the proposed quantization method to representative one-stage object detectors, You Only Look Once v3 and v4 (YOLOv3 and YOLOv4), detection accuracy similar to the original networks (i.e., YOLOv3 and YOLOv4) with a single-precision floating-point format (32-bit) is maintained despite expressing weights with only about 20% of the bits compared to a single-precision floating-point format in COCO dataset.

Compression for Text Detection and Recognition Based on Low Bit-Width Quantization

Hessian-based Mixed-Precision Quantization with Transition Aware Training for Neural Networks

Bit-shrinking: Limiting Instantaneous Sharpness for Improving Post-training Quantization

Focused Quantization for Sparse CNNs

Towards Super Compressed Neural Networks for Object Identification: Quantized Low-Rank Tensor Decomposition with Self-Attention

Space Efficient Quantization for Deep Convolutional Neural Networks

Deep Neural Network Compression With Single and Multiple Level Quantization

Residual Quantization for Low Bit-Width Neural Networks.

Unsupervised Network Quantization via Fixed-Point Factorization

Adaptive Layerwise Quantization for Deep Neural Network Compression

Effective Quantization Methods for Recurrent Neural Networks

VecQ: Minimal Loss DNN Model Compression With Vectorized Weight Quantization

Weight Normalization based Quantization for Deep Neural Network Compression

Instance-Aware Dynamic Neural Network Quantization

Compressing Deep Convolutional Networks using Vector Quantization

Quantization Networks

Zero-Centered Fixed-Point Quantization With Iterative Retraining for Deep Convolutional Neural Network-Based Object Detectors

A 4-Bit Integer-Only Neural Network Quantization Method Based on Shift Batch Normalization

LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks

Neural Network Language Model Compression with Product Quantization and Soft Binarization

Resiliency of Deep Neural Networks under Quantization