Introducing Self-Supervised Phonetic Information for Text-Independent Speaker Verification

Ziyang Zhang,Wu Guo,Bin Gu
DOI: https://doi.org/10.21437/interspeech.2023-1558
2023-01-01
Abstract:This paper presents a novel multi-task learning framework by introducing self-supervised phonetic information for deep speaker embedding extraction. The primary task is still to classify speakers, but we consider an auxiliary task to identify phoneme boundaries in speech signals following the Noise Con-trastive Estimation principle. To further utilize self-supervised information to assist speaker feature learning, the features of intermediate layers in the main task are refined by the features of corresponding layers in the auxiliary task through masking and biasing operations. We use the VoxCeleb1 and CN-Celeb datasets for performance evaluation, which consistently verifies the efficacy of the proposed method.
What problem does this paper attempt to address?