Sensitivity-based Acceleration and Compression Algorithm for Convolution Neural Network.

Yue Niu,Zhenyu Liu,Chunsheng Mei,Xiangyang Ji,Wei Zhou,Dongsheng Wang
DOI: https://doi.org/10.1109/globalsip.2017.8309064
2017-01-01
Abstract:Convolution neural networks(CNNs) have an impressive performance in image processing and other machine learning tasks, meanwhile, colossal computation and memory requirements existing in most classical models restrict the deployment in portable and power-limited devices. The efficient approaches to circumventing the above hindrances stem from shrinking the network scale, which can be divided into two categories, i.e., the network pruning and low-rank approximation of kernel matrices. As compared to the pruning scheme, the low-rank approximation method has lower compression ratio, but is much friendlier to parallelism. In this paper, by analyzing the sensitivity of the rank of each layer to the network accuracy, we proposed a sensitivity-based layer-wise low-rank approximation algorithm. As compared with the traditional rank reduce methods, the acceleration ratio of our proposal is improved by 20%. When deploying our method on VGGNet-16 model, 2:7X compression/acceleration ratio on convolution layers and 10:9X compression/acceleration ratio on FC layers are achieved with 0.05% top-1 accuracy loss and 0.01% top-5 accuracy loss.
What problem does this paper attempt to address?