Abstract:Convolutional neural networks (CNNs) have achieved remarkable success in various computer vision tasks, which are extremely powerful to deal with massive training data by using tens of millions of parameters. However, CNNs often cost significant memory and computation consumption, which prohibits their usage in resource-limited environments such as mobile or embedded devices. To address the above issues, the existing approaches typically focus on either accelerating the convolutional layers or compressing the fully-connected layers separatedly, without pursuing a joint optimum. In this paper, we overcome such a limitation by introducing a holistic CNN compression framework, termed LRDKT, which works throughout both convolutional and fully-connected layers. First, a low-rank decomposition (LRD) scheme is proposed to remove redundancies across both convolutional kernels and fully-connected matrices, which has a novel closed-form solver to significantly improve the efficiency of the existing iterative optimization solvers. Second, a novel knowledge transfer (KT) based training scheme is introduced. To recover the accumulated accuracy loss and overcome the vanishing gradient, KT explicitly aligns outputs and intermediate responses from a teacher (original) network to its student (compressed) network. We have comprehensively analyzed and evaluated the compression and speedup ratios of the proposed model on MNIST and ILSVRC 2012 benchmarks. In both benchmarks, the proposed scheme has demonstrated superior performance gains over the state-of-the-art methods. We also demonstrate the proposed compression scheme for the task of transfer learning, including domain adaptation and object detection, which show exciting performance gains over the state-of-the-arts. Our source code and compressed models are available at https://github.com/ShaohuiLin/LRDKT.

Pushing the limits of RNN Compression

Kronecker CP Decomposition with Fast Multiplication for Compressing RNNs

Run-Time Efficient RNN Compression for Inference on Edge Devices

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Holistic CNN Compression Via Low-Rank Decomposition with Knowledge Transfer.

Tensor train decompositions on recurrent networks

MPDCompress - Matrix Permutation Decomposition Algorithm for Deep Neural Network Compression

Compression of Recurrent Neural Networks using Matrix Factorization

Unified Framework for Neural Network Compression via Decomposition and Optimal Rank Selection

MCMC: Multi-Constrained Model Compression Via One-Stage Envelope Reinforcement Learning.

Parameter Compression of Recurrent Neural Networks and Degradation of Short-term Memory

Convolutional Neural Network Compression Based on Low-Rank Decomposition

Comprehensive SNN Compression Using ADMM Optimization and Activity Regularization

ECNR: Efficient Compressive Neural Representation of Time-Varying Volumetric Datasets

ECC: Platform-Independent Energy-Constrained Deep Neural Network Compression via a Bilinear Regression Model

Deep learning model compression using network sensitivity and gradients

L-GreCo: Layerwise-Adaptive Gradient Compression for Efficient and Accurate Deep Learning

Supervised Compression for Resource-Constrained Edge Computing Systems

Wide Compression: Tensor Ring Nets

Rank and run-time aware compression of NLP Applications

End-to-End Neural Network Compression via $\frac{\ell_1}{\ell_2}$ Regularized Latency Surrogates