Abstract:Deep neural models have achieved remarkable performance on various supervised and unsupervised learning tasks, but it is a challenge to deploy these large-size networks on resource-limited devices. As a representative type of model compression and acceleration methods, knowledge distillation (KD) solves this problem by transferring knowledge from heavy teachers to lightweight students. However, most distillation methods focus on imitating the responses of teacher networks but ignore the information redundancy of student networks. In this article, we propose a novel distillation framework difference-based channel contrastive distillation (DCCD), which introduces channel contrastive knowledge and dynamic difference knowledge into student networks for redundancy reduction. At the feature level, we construct an efficient contrastive objective that broadens student networks' feature expression space and preserves richer information in the feature extraction stage. At the final output level, more detailed knowledge is extracted from teacher networks by making a difference between multiview augmented responses of the same instance. We enhance student networks to be more sensitive to minor dynamic changes. With the improvement of two aspects of DCCD, the student network gains contrastive and difference knowledge and reduces its overfitting and redundancy. Finally, we achieve surprising results that the student approaches and even outperforms the teacher in test accuracy on CIFAR-100. We reduce the top-1 error to 28.16% on ImageNet classification and 24.15% for cross-model transfer with ResNet-18. Empirical experiments and ablation studies on popular datasets show that our proposed method can achieve state-of-the-art accuracy compared with other distillation methods.

A Novel Deep Learning Model Compression Algorithm

A Model Compression Method Using Significant Data and Knowledge Distillation

Pruning by Training: A Novel Deep Neural Network Compression Framework for Image Processing.

Improved Model Compression Method Based on Information Entropy

Class-Aware Pruning for Efficient Neural Networks

Loss Constrains Added Squeeze and Excitation Blocks for Pruning Deep Neural Networks

DCCD: Reducing Neural Network Redundancy Via Distillation

Pruning at a Glance: Global Neural Pruning for Model Compression

Model Compression for Deep Neural Networks: A Survey

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Deep learning model compression using network sensitivity and gradients

On Compressing Deep Models by Low Rank and Sparse Decomposition.

Efficient Network Compression Through Smooth-Lasso Constraint

Differential Evolution Based Layer-Wise Weight Pruning for Compressing Deep Neural Networks

Analysis of Model Compression Using Knowledge Distillation

Few Sample Knowledge Distillation for Efficient Network Compression

Anonymous Model Pruning for Compressing Deep Neural Networks

An efficient pruning and fine-tuning method for deep spiking neural network

On Model Compression for Neural Networks: Framework, Algorithm, and Convergence Guarantee

Comprehensive Study on Performance Evaluation and Optimization of Model Compression: Bridging Traditional Deep Learning and Large Language Models

Progressive DNN Compression: A Key to Achieve Ultra-High Weight Pruning and Quantization Rates using ADMM