Conformer-based Language Embedding with Self-Knowledge Distillation for Spoken Language Identification

Feng Wang,Lingyan Huang,Tao Li,Qingyang Hong,Lin Li
DOI: https://doi.org/10.21437/interspeech.2023-1557
2023-01-01
Abstract:The utilization of Conformer-based architecture has been shown to be effective in improving the performance of spoken language identification (LID) in recent years due to Conformer's superior representational capacity. However, when performing language identification on short speech segments, a significant drop in performance is often observed. In this paper, we adopt a method to alleviate this issue by introducing a self-knowledge distillation technique to Conformer-based LID architecture whose encoder was pretrained by an ASR task. We distill the predictive distribution between the original input and the input processed by a double-ended random masking module during the training stage for each sample. Experimental results demonstrate the effectiveness of the method on two datasets: OLR21 with 16,000 Hz sampling rate and LRE22 with 8,000 Hz sampling rate. Moreover, the method also enhances the performance of language identification on short-duration speech segments.
What problem does this paper attempt to address?