Abstract:Convolutional neural networks (CNNs) have achieved remarkable success in various computer vision tasks, which are extremely powerful to deal with massive training data by using tens of millions of parameters. However, CNNs often cost significant memory and computation consumption, which prohibits their usage in resource-limited environments such as mobile or embedded devices. To address the above issues, the existing approaches typically focus on either accelerating the convolutional layers or compressing the fully-connected layers separatedly, without pursuing a joint optimum. In this paper, we overcome such a limitation by introducing a holistic CNN compression framework, termed LRDKT, which works throughout both convolutional and fully-connected layers. First, a low-rank decomposition (LRD) scheme is proposed to remove redundancies across both convolutional kernels and fully-connected matrices, which has a novel closed-form solver to significantly improve the efficiency of the existing iterative optimization solvers. Second, a novel knowledge transfer (KT) based training scheme is introduced. To recover the accumulated accuracy loss and overcome the vanishing gradient, KT explicitly aligns outputs and intermediate responses from a teacher (original) network to its student (compressed) network. We have comprehensively analyzed and evaluated the compression and speedup ratios of the proposed model on MNIST and ILSVRC 2012 benchmarks. In both benchmarks, the proposed scheme has demonstrated superior performance gains over the state-of-the-art methods. We also demonstrate the proposed compression scheme for the task of transfer learning, including domain adaptation and object detection, which show exciting performance gains over the state-of-the-arts. Our source code and compressed models are available at https://github.com/ShaohuiLin/LRDKT.

Joint architecture and knowledge distillation in CNN for Chinese text recognition

DCCD: Reducing Neural Network Redundancy Via Distillation

Collaborative Distillation for Ultra-Resolution Universal Style Transfer

Cross Architecture Distillation for Face Recognition

Deep Convolutional Neural Networks Based on Knowledge Distillation for Offline Handwritten Chinese Character Recognition

Cross-Architecture Knowledge Distillation

Refining Architectures of Deep Convolutional Neural Networks

Self-Distillation: Towards Efficient and Compact Neural Networks

Design of a Very Compact CNN Classifier for Online Handwritten Chinese Character Recognition Using DropWeight and Global Pooling

DTCNet: Transformer-CNN Distillation for Super-Resolution of Remote Sensing Image

Class Attention Transfer Based Knowledge Distillation

Promoting CNNs with Cross-Architecture Knowledge Distillation for Efficient Monocular Depth Estimation

Building Efficient CNN Architecture for Offline Handwritten Chinese Character Recognition

Highlight Every Step: Knowledge Distillation via Collaborative Teaching

A Study of Designing Compact Classifiers Using Deep Neural Networks for Online Handwritten Chinese Character Recognition

Offline Handwritten Chinese Text Recognition with Convolutional Neural Networks

Building Fast and Compact Convolutional Neural Networks for Offline Handwritten Chinese Character Recognition

Pixel Distillation: A New Knowledge Distillation Scheme for Low-Resolution Image Recognition

A Good Student is Cooperative and Reliable: CNN-Transformer Collaborative Learning for Semantic Segmentation

Holistic CNN Compression Via Low-Rank Decomposition with Knowledge Transfer.