Adaptive Layerwise Quantization for Deep Neural Network Compression

Xiaotian Zhu,Wengang Zhou,Houqiang Li
DOI: https://doi.org/10.1109/icme.2018.8486500
2018-01-01
Abstract:Building efficient deep neural network models has become a hot-spot in recent years for deep learning research. Many works on network compression try to quantize a neural network with low bitwidth weights and activations. However, most of the existing network quantization methods set a fixed bitwidth for the whole network, which leads to large performance drop under high compression rate. In this paper we introduce an adaptive layerwise quantization method which quantizes the network with different bitwidth assigned to different layers. By using entropy of weights and activations as an importance indicator for each layer, we keep most of the layers under a high compression rate while a few most important layers receive more bit assignment. Experiments on CI-FAR10 and ImageNet2012 datasets demonstrate that our layerwise quantization could achieve smaller model size and less computation cost than the comparison fixed bitwidth methods with comparable accuracy, or higher accuracy with similar model size and computational complexity.
What problem does this paper attempt to address?