Improving Speaker Verification with Noise-Aware Label Ensembling and Sample Selection: Learning and Correcting Noisy Speaker Labels

Zhihua Fang,Liang He,Lin Li,Ying Hu
DOI: https://doi.org/10.1109/taslp.2024.3407527
2024-01-01
Abstract:Supervised deep learning has achieved tremendous success in speaker verification. However, deep speaker models tend to overfit noisy labels when they are present in the speaker datasets. To mitigate the detrimental effects of noisy labels, in this paper, we propose a novel Label Ensembling and Sample Selection framework. Firstly, we select labels with high confidence rankings as clean samples. Additionally, we use predictions from different epochs during training to smoothly correct the noisy labels. Our method does not require staged training and achieves integration of learning from noisy labels, selecting clean labels, and correcting noisy labels. A significant number of experimental results demonstrate the robustness of our method under noisy labels. Even when the training data contains 50% noisy labels, our method can mitigate an average of 86.54% of the performance degradation compared to the standard training method. Furthermore, further ablation experiments and analysis validate the effectiveness of High Confidence Ranking for sample selection and the correctness of Label Ensembling for noisy label correction.
What problem does this paper attempt to address?