Self-Knowledge Distillation via Feature Enhancement for Speaker Verification

Bei Liu,Haoyu Wang,Zhengyang Chen,Shuai Wang,Yanmin Qian
DOI: https://doi.org/10.1109/icassp43922.2022.9746529
2022-05-23
Abstract:As the most widely used technique, deep speaker embedding learning has become predominant in speaker verification task recently. Very large neural networks such as ECAPA-TDNN and ResNet can achieve the state-of-the-art performance. However, large models are computationally unfriendly in general, which require massive storage and computation resources. Model compression has been a hot research topic. Parameter quantization usually results in significant performance degradation. Knowledge distillation demands a pretrained complex teacher model. In this paper, we introduce a novel self-knowledge distillation method, namely Self-Knowledge Distillation via Feature Enhancement (SKDFE). It utilizes an auxiliary self-teacher network to distill its own refined knowledge without the need of a pretrained teacher network. Additionally, we apply the self-knowledge distillation at two different levels: label level and feature level. Experiments on Voxceleb dataset show that our proposed self-knowledge distillation method can make small models have comparable or even better performance than large ones. Large models can also be further improved when applying our method.
What problem does this paper attempt to address?