Weighted Cluster-Range Loss and Criticality-Enhancement Loss for Speaker Recognition

Jianye Mo,Li Xu
DOI: https://doi.org/10.3390/app10249004
2020-01-01
Abstract:While traditional i-vector based methods are popular in the field of speaker recognition, deep learning has recently found more and more applications to the end-to-end models due to its attractive performance. One effective practice is the integration of attention mechanism into the Convolution Neural Networks (CNNs). In this work, a light-weight dual-path attention block is proposed by combining the self-attention and Convolutional Block Attention Module (CBAM), which helps to capture more multi-source features with neglectable extra time expense. Additionally, a Weighted Cluster-Range Loss (WCRL) is proposed to enhance the identification performance of Cluster-Range Loss (CRL) on indecisive samples. Besides, to address the low efficiency in the initial training stage of CRL, a novel Criticality-Enhancement Loss (CEL) is also presented. Both of the proposed loss functions could significantly promote the training efficiency and globally improve the recognition performance. Experimental results are presented to show the effectiveness of the proposed scheme, which achieves a competitive top-1 accuracy of 92.0%, top-5 accuracy of 97.6%, and Equal Error Rate (EER) of 3.5% on the VoxCeleb1 dataset.
What problem does this paper attempt to address?