Adaptive Large Margin Fine-Tuning for Robust Speaker Verification

Leying Zhang,Zhengyang Chen,Yanmin Qian
DOI: https://doi.org/10.1109/icassp49357.2023.10094744
2023-01-01
Abstract:Large margin fine-tuning (LMFT) is an effective strategy to improve the speaker verification system’s performance and is widely used in speaker verification challenge systems. Because the large margin in the loss function could make the training task too difficult, people usually use longer training segments to alleviate this problem in LMFT. However, the LMFT model could have a duration mismatch with the real scenario verification, where the verification speech may be very short. In our experiments, we also find that LMFT fails in short duration and other verification scenarios. To solve this problem, we propose the duration-based and similarity-based adaptive large margin fine-tuning (ALMFT) strategy. To verify its effectiveness, we constructed fixed, variable length, and asymmetric verification trials based on VoxCeleb1. Experimental results demonstrate that ALMFT algorithms are very effective and robust, which not only achieve comparable improvement with LMFT in official VoxCeleb evaluation trials but also overcome performance degradation problems in short-duration and asymmetric scenarios respectively.
What problem does this paper attempt to address?