Resiliency of Deep Neural Networks under Quantization

Wonyong Sung,Sungho Shin,Kyuyeon Hwang

DOI: https://doi.org/10.48550/arXiv.1511.06488

2016-01-07

Abstract:The complexity of deep neural network algorithms for hardware implementation can be much lowered by optimizing the word-length of weights and signals. Direct quantization of floating-point weights, however, does not show good performance when the number of bits assigned is small. Retraining of quantized networks has been developed to relieve this problem. In this work, the effects of retraining are analyzed for a feedforward deep neural network (FFDNN) and a convolutional neural network (CNN). The network complexity is controlled to know their effects on the resiliency of quantized networks by retraining. The complexity of the FFDNN is controlled by varying the unit size in each hidden layer and the number of layers, while that of the CNN is done by modifying the feature map configuration. We find that the performance gap between the floating-point and the retrain-based ternary (+1, 0, -1) weight neural networks exists with a fair amount in 'complexity limited' networks, but the discrepancy almost vanishes in fully complex networks whose capability is limited by the training data, rather than by the number of connections. This research shows that highly complex DNNs have the capability of absorbing the effects of severe weight quantization through retraining, but connection limited networks are less resilient. This paper also presents the effective compression ratio to guide the trade-off between the network size and the precision when the hardware resource is limited.

Machine Learning,Neural and Evolutionary Computing

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the impact of quantization (i.e., reducing the word length of weights and signals) on the network performance in deep neural networks (DNNs). Specifically, the researchers analyzed the impact of direct quantization and post - retraining quantization on the performance of feed - forward deep neural networks (FFDNN) and convolutional neural networks (CNN) under different network complexities. The main objectives of the study are: 1. **Analysis of quantization effects**: By changing the number of hidden - layer units and the number of layers in FFDNN, as well as the feature - map configuration in CNN, the impact of quantization on network performance was studied. In particular, the researchers focused on whether the network performance would significantly decline when quantized to a small number of bits (such as 2 - bit or 3 - bit). 2. **Effect of retraining**: To explore whether the performance of the network after quantization can be restored through retraining, especially when using ternary (+1, 0, - 1) or 7 - level (+3, +2, +1, 0, - 1, - 2, - 3) weight representations. 3. **Relationship between network complexity and quantization**: It was found that highly complex DNNs can absorb the impact of severe weight quantization through retraining, while networks with limited connections are less resilient. This means that, when the network scale is large enough, even with low - precision weights, performance close to floating - point precision can be achieved through retraining. 4. **Effective compression ratio (ECR)**: A concept of effective compression ratio was proposed to guide how to balance network scale and precision when hardware resources are limited. The effective compression ratio is defined as the ratio of the effective size of the uncompressed network to the size of the compressed network, which helps to determine how much memory space can be saved by quantization while maintaining the same precision. In summary, this paper aims to explore the performance changes of deep neural networks under quantization conditions and provide a method to optimize network design so that it can still maintain high performance when hardware resources are limited.

Resiliency of Deep Neural Networks under Quantization

Optimizing Quantized Neural Networks in a Weak Curvature Manifold

Residual Quantization for Low Bit-Width Neural Networks.

Adaptive Quantization for Deep Neural Network

Training High-Performance and Large-Scale Deep Neural Networks with Full 8-Bit Integers.

Deep Neural Network Compression With Single and Multiple Level Quantization

FQ-Conv: Fully Quantized Convolution for Efficient and Accurate Inference

The Effects Of Quantization On Multi-Layer Feedforward Neural Networks

Towards Low-Bit Quantization of Deep Neural Networks with Limited Data.

Quantization of Deep Neural Networks for Accurate Edge Computing

A Novel Low-Bit Quantization Strategy for Compressing Deep Neural Networks

Bit Efficient Quantization for Deep Neural Networks

Weighted-Entropy-Based Quantization for Deep Neural Networks

A Survey of Quantization Methods for Deep Neural Networks

Space Efficient Quantization for Deep Convolutional Neural Networks

Quantization Networks

A White Paper on Neural Network Quantization

Robustness-aware 2-bit quantization with real-time performance for neural network

Direct Quantization for Training Highly Accurate Low Bit-width Deep Neural Networks

A 4-Bit Integer-Only Neural Network Quantization Method Based on Shift Batch Normalization

Mixed-precision quantized neural networks with progressively decreasing bitwidth