Abstract:Convolutional neural networks (CNNs) have achieved remarkable success in various computer vision tasks, which are extremely powerful to deal with massive training data by using tens of millions of parameters. However, CNNs often cost significant memory and computation consumption, which prohibits their usage in resource-limited environments such as mobile or embedded devices. To address the above issues, the existing approaches typically focus on either accelerating the convolutional layers or compressing the fully-connected layers separatedly, without pursuing a joint optimum. In this paper, we overcome such a limitation by introducing a holistic CNN compression framework, termed LRDKT, which works throughout both convolutional and fully-connected layers. First, a low-rank decomposition (LRD) scheme is proposed to remove redundancies across both convolutional kernels and fully-connected matrices, which has a novel closed-form solver to significantly improve the efficiency of the existing iterative optimization solvers. Second, a novel knowledge transfer (KT) based training scheme is introduced. To recover the accumulated accuracy loss and overcome the vanishing gradient, KT explicitly aligns outputs and intermediate responses from a teacher (original) network to its student (compressed) network. We have comprehensively analyzed and evaluated the compression and speedup ratios of the proposed model on MNIST and ILSVRC 2012 benchmarks. In both benchmarks, the proposed scheme has demonstrated superior performance gains over the state-of-the-art methods. We also demonstrate the proposed compression scheme for the task of transfer learning, including domain adaptation and object detection, which show exciting performance gains over the state-of-the-arts. Our source code and compressed models are available at https://github.com/ShaohuiLin/LRDKT.

Automatic CNN Compression Based on Hyper-parameter Learning.

Structured Deep Neural Network Pruning by Varying Regularization Parameters.

SparseConnect: Regularising CNNs on Fully Connected Layers

Compressing by Learning in a Low-Rank and Sparse Decomposition Form.

Sensitivity-Oriented Layer-Wise Acceleration and Compression for Convolutional Neural Network.

Holistic CNN Compression Via Low-Rank Decomposition with Knowledge Transfer.

Graph Structure Learning-Based Compression Method for Convolutional Neural Networks.

A Unified Approximation Framework for Compressing and Accelerating Deep Neural Networks

Regularized Training Framework for Combining Pruning and Quantization to Compress Neural Networks

A Model Compression Method Using Significant Data and Knowledge Distillation

Accelerating CNN Training by Sparsifying Activation Gradients

Exploiting Kernel Sparsity and Entropy for Interpretable CNN Compression

Two-Stage Model Compression and Acceleration: Optimal Student Network for Better Performance

Efficient Neural Network Compression Inspired by Compressive Sensing.

Automl For Densenet Compression

A Survey of Model Compression and Acceleration for Deep Neural Networks.

Improving Network Slimming with Nonconvex Regularization

Re-training and parameter sharing with the Hash trick for compressing convolutional neural networks

Towards Convolutional Neural Networks Compression Via Global Error Reconstruction.

Convolutional Neural Network Compression via Dynamic Parameter Rank Pruning

Flexi-Compression: A Flexible Model Compression Method for Autonomous Driving