Abstract:Neural network compression techniques, such as knowledge distillation (KD) and network pruning, have received increasing attention. Recent work `Prune, then Distill' reveals that a pruned student-friendly teacher network can benefit the performance of KD. However, the conventional teacher-student pipeline, which entails cumbersome pre-training of the teacher and complicated compression steps, makes pruning with KD less efficient. In addition to compressing models, recent compression techniques also emphasize the aspect of efficiency. Early pruning demands significantly less computational cost in comparison to the conventional pruning methods as it does not require a large pre-trained model. Likewise, a special case of KD, known as self-distillation (SD), is more efficient since it requires no pre-training or student-teacher pair selection. This inspires us to collaborate early pruning with SD for efficient model compression. In this work, we propose the framework named Early Pruning with Self-Distillation (EPSD), which identifies and preserves distillable weights in early pruning for a given SD task. EPSD efficiently combines early pruning and self-distillation in a two-step process, maintaining the pruned network's trainability for compression. Instead of a simple combination of pruning and SD, EPSD enables the pruned network to favor SD by keeping more distillable weights before training to ensure better distillation of the pruned network. We demonstrated that EPSD improves the training of pruned networks, supported by visual and quantitative analyses. Our evaluation covered diverse benchmarks (CIFAR-10/100, Tiny-ImageNet, full ImageNet, CUB-200-2011, and Pascal VOC), with EPSD outperforming advanced pruning and SD techniques.

Local Pruning Global Pruned Network under Knowledge Distillation

Pruning by Training: A Novel Deep Neural Network Compression Framework for Image Processing.

A Feature-map Discriminant Perspective for Pruning Deep Neural Networks

Class-Aware Pruning for Efficient Neural Networks

Knowledge from the Original Network: Restore a Better Pruned Network with Knowledge Distillation

Neural Network Pruning with Residual-Connections and Limited-Data

Knapsack Pruning with Inner Distillation

Distilling the Knowledge in Data Pruning

Accelerating Convolutional Neural Networks By Group-Wise 2d-Filter Pruning

Pruning-and-distillation: One-stage Joint Compression Framework for CNNs Via Clustering

Progressive Multi-Level Distillation Learning for Pruning Network

A Pruning Method Based on the Dissimilarity of Angle among Channels and Filters

An Efficient Method for Model Pruning Using Knowledge Distillation with Few Samples.

Few Sample Knowledge Distillation for Efficient Network Compression

Using Distillation to Improve Network Performance after Pruning and Quantization

Multi-grained Pruning Method of Convolutional Neural Network.

Block-Wisely Supervised Network Pruning with Knowledge Distillation and Markov Chain Monte Carlo

Model Selection - Knowledge Distillation Framework for Model Compression

Pruning at a Glance: Global Neural Pruning for Model Compression

Optimization Based Layer-Wise Pruning Threshold Method for Accelerating Convolutional Neural Networks

EPSD: Early Pruning with Self-Distillation for Efficient Model Compression