Abstract:While previous network compression methods achieve great success, most of them rely on the abundant training data which is, unfortunately, often unavailable in practice due to some reasons, e.g., privacy issues, storage constraints, and transmission limitations. A promising way to solve this problem is to perform compression with a few unlabeled data. Proceeding along this way, we propose a novel few-shot network compression framework named Few-Shot Slimming (FSS). FSS follows the student/teacher paradigm, and contains two steps: (1) construct the student by inheriting principal feature maps from the teacher; (2) refine the student feature representation by knowledge distillation with an enhanced mixing data augmentation method called GridMix. Specifically, in the first step, we employ normalized cross correlation to perform the principal feature analysis, and then theoretically construct a new indicator to select the most informative feature maps from the teacher for the student. The indicator is based on the variances of feature maps which can efficiently quantitate the information richness of the input feature maps in a feature-agnostic manner. In the second step, we perform the knowledge distillation for the initialized student in first step with a novel grid-based mixing data augmentation technique which greatly extends the limited sample dataset. In this way, the student is able to refine its feature representation and achieves a better result. Extensive experiments on multiple benchmarks demonstrate the state-of-the-art performance of FSS. For example, by using 0.2% label-free data of full training set, FSS yields a 60% FLOPs reduction for DenseNet-40 on CIFAR-10 with only a loss of 0.8% in top-1 accuracy, achieving a result on par with that obtained by the conventional full-data methods.

An Efficient Method for Model Pruning Using Knowledge Distillation with Few Samples.

Few Sample Knowledge Distillation for Efficient Network Compression

DCCD: Reducing Neural Network Redundancy Via Distillation

Learning Slimming SSD Through Pruning and Knowledge Distillation

Pruning by Training: A Novel Deep Neural Network Compression Framework for Image Processing.

Class-Aware Pruning for Efficient Neural Networks

Using Distillation to Improve Network Performance after Pruning and Quantization

A Model Compression Method Using Significant Data and Knowledge Distillation

EPSD: Early Pruning with Self-Distillation for Efficient Model Compression

Progressive Network Grafting for Few-Shot Knowledge Distillation

A Novel Architecture Slimming Method for Network Pruning and Knowledge Distillation

Model Selection - Knowledge Distillation Framework for Model Compression

Knowledge from the Original Network: Restore a Better Pruned Network with Knowledge Distillation

Model Compression Algorithm Via Reinforcement Learning and Knowledge Distillation

Towards Efficient Network Compression Via Few-Shot Slimming.

Local Pruning Global Pruned Network under Knowledge Distillation

Few Shot Network Compression via Cross Distillation

Pruning-and-distillation: One-stage Joint Compression Framework for CNNs Via Clustering

CDFKD-MFS: Collaborative Data-free Knowledge Distillation Via Multi-level Feature Sharing

PQK: Model Compression via Pruning, Quantization, and Knowledge Distillation

Data-Free Network Pruning for Model Compression