Angular Softmax Loss for End-to-end Speaker Verification.

Yutian Li,Feng Gao,Zhijian Ou,Jiasong Sun
DOI: https://doi.org/10.1109/iscslp.2018.8706570
2018-01-01
Abstract:End-to-end speaker verification systems have received increasing interests. The traditional i-vector approach trains a generative model (basically a factor-analysis model) to extract ivectors as speaker embeddings. In contrast, the end-to-end approach directly trains a discriminative model (often a neural network) to learn discriminative speaker embeddings; a crucial component is the training criterion. In this paper, we use angular softmax (A-softmax), which is originally proposed for face verification, as the loss function for feature learning in end-to-end speaker verification. By introducing margins between classes into softmax loss, A-softmax can learn more discriminative features than softmax loss and triplet loss, and at the same time, is easy and stable for usage. We make two contributions in this work. 1) We introduce A-softmax loss into end-to-end speaker verification and achieve significant EER reductions. 2) We find that the combination of using A-softmax in training the front-end and using PLDA in the back-end scoring further boosts the performance of end-to-end systems under short utterance condition (short in both enrollment and test). Experiments are conducted on part of Fisher dataset and demonstrate the improvements of using A-softmax.
What problem does this paper attempt to address?