Abstract:Knowledge Distillation (KD) is a powerful technique for transferring knowledge between neural network models, where a pre-trained teacher model is used to facilitate the training of the target student model. However, the availability of a suitable teacher model is not always guaranteed. To address this challenge, Self-Knowledge Distillation (SKD) attempts to construct a teacher model from itself. Existing SKD methods add Auxiliary Classifiers (AC) to intermediate layers of the model or use the history models and models with different input data within the same class. However, these methods are computationally expensive and only capture time-wise and class-wise features of data. In this paper, we propose a lightweight SKD framework that utilizes multi-source information to construct a more informative teacher. Specifically, we introduce a Distillation with Reverse Guidance (DRG) method that considers different levels of information extracted by the model, including edge, shape, and detail of the input data, to construct a more informative teacher. Additionally, we design a Distillation with Shape-wise Regularization (DSR) method that ensures a consistent shape of ranked model output for all data. We validate the performance of the proposed DRG, DSR, and their combination through comprehensive experiments on various datasets and models. Our results demonstrate the superiority of the proposed methods over baselines (up to 2.87%) and state-of-the-art SKD methods (up to 1.15%), while being computationally efficient and robust. The code is available at https://github.com/xucong-parsifal/LightSKD.

Speaker Change Detection with Weighted-sum Knowledge Distillation Based on Self-supervised Pre-trained Models

Explore the Use of Self-supervised Pre-trained Acoustic Features on Disguised Speech Detection

End-to-end Spoofing Speech Detection and Knowledge Distillation under Noisy Conditions

Integrated Multi-Level Knowledge Distillation for Enhanced Speaker Verification

Adaptive Knowledge Distillation between Text and Speech Pre-trained Models

Ensemble Knowledge Distillation of Self-Supervised Speech Models

Knowledge Distillation from Multiple Foundation Models for End-to-End Speech Recognition

Knowledge Distillation for Neural Transducers from Large Self-Supervised Pre-trained Models

SCDNet: Self-supervised Learning Feature-based Speaker Change Detection

Accelerating Multiple Intent Detection and Slot Filling Via Targeted Knowledge Distillation

Self-Knowledge Distillation via Feature Enhancement for Speaker Verification

Distil-DCCRN: A Small-footprint DCCRN Leveraging Feature-based Knowledge Distillation in Speech Enhancement

Lightweight Self-Knowledge Distillation with Multi-source Information Fusion

One-Step Knowledge Distillation and Fine-Tuning in Using Large Pre-Trained Self-Supervised Learning Models for Speaker Verification

Speaker Change Detection for Transformer Transducer ASR

Improve Knowledge Distillation via Label Revision and Data Selection

USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models

Weakly Supervised Change Detection via Knowledge Distillation and Multiscale Sigmoid Inference

Dynamic Knowledge Distillation for Pre-trained Language Models

Emphasized Non-Target Speaker Knowledge in Knowledge Distillation for Automatic Speaker Verification