Abstract:Deep neural models have achieved remarkable performance on various supervised and unsupervised learning tasks, but it is a challenge to deploy these large-size networks on resource-limited devices. As a representative type of model compression and acceleration methods, knowledge distillation (KD) solves this problem by transferring knowledge from heavy teachers to lightweight students. However, most distillation methods focus on imitating the responses of teacher networks but ignore the information redundancy of student networks. In this article, we propose a novel distillation framework difference-based channel contrastive distillation (DCCD), which introduces channel contrastive knowledge and dynamic difference knowledge into student networks for redundancy reduction. At the feature level, we construct an efficient contrastive objective that broadens student networks' feature expression space and preserves richer information in the feature extraction stage. At the final output level, more detailed knowledge is extracted from teacher networks by making a difference between multiview augmented responses of the same instance. We enhance student networks to be more sensitive to minor dynamic changes. With the improvement of two aspects of DCCD, the student network gains contrastive and difference knowledge and reduces its overfitting and redundancy. Finally, we achieve surprising results that the student approaches and even outperforms the teacher in test accuracy on CIFAR-100. We reduce the top-1 error to 28.16% on ImageNet classification and 24.15% for cross-model transfer with ResNet-18. Empirical experiments and ablation studies on popular datasets show that our proposed method can achieve state-of-the-art accuracy compared with other distillation methods.

Self-Knowledge Distillation via Feature Enhancement for Speaker Verification

DCCD: Reducing Neural Network Redundancy Via Distillation

Emphasized Non-Target Speaker Knowledge in Knowledge Distillation for Automatic Speaker Verification

Integrated Multi-Level Knowledge Distillation for Enhanced Speaker Verification

Distilling Multi-Level X-vector Knowledge for Small-footprint Speaker Verification

One-Step Knowledge Distillation and Fine-Tuning in Using Large Pre-Trained Self-Supervised Learning Models for Speaker Verification

Leveraging ASR Pretrained Conformers for Speaker Verification through Transfer Learning and Knowledge Distillation

Distil-DCCRN: A Small-footprint DCCRN Leveraging Feature-based Knowledge Distillation in Speech Enhancement

End-to-End Feature Learning for Text-Independent Speaker Verification

Self-Knowledge Distillation in Natural Language Processing

Knowledge Distillation from Multiple Foundation Models for End-to-End Speech Recognition

Self-Distillation: Towards Efficient and Compact Neural Networks

Self-boosting for Feature Distillation

Lightweight Self-Knowledge Distillation with Multi-source Information Fusion

Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision

Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation

Ensemble Knowledge Distillation of Self-Supervised Speech Models

Leveraging Non-Causal Knowledge via Cross-Network Knowledge Distillation for Real-Time Speech Enhancement