SpeakerGAN: Speaker identification with conditional generative adversarial network

Liyang Chen,Yifeng Liu,Wendong Xiao,Yingxue Wang,Haiyong Xie
DOI: https://doi.org/10.1016/j.neucom.2020.08.040
IF: 6
2020-01-01
Neurocomputing
Abstract:Current methods based on the traditional i-vectors and deep neural network (DNN) have shown effectiveness on the speaker identification task, especially with the corpus of large scale. However, when the size of the training dataset is small, the overfitting problem may happen and lead to performance degradation. Besides, the robust identification still remains a challenging problem even under the less strict requirements. This paper proposes a novel approach, SpeakerGAN, for speaker identification with the conditional generative adversarial network (CGAN). It allows the adversarial networks for distinguishing real/fake samples and predicting class labels simultaneously. We configure the generator and the discriminator in SpeakerGAN with the gated convolutional neural network (CNN) and the modified residual network (ResNet) to obtain generated samples of high diversity as well as increase the network capacity. The multiple loss functions are combined and optimized to encourage the correct mapping and accelerate the convergence. Experimental results show that SpeakerGAN reduces the classification error rate by 87% and 16% compared with the traditional i-vector system and the state-of-the-art DNN based method. Under the scenario of limited training data, SpeakerGAN obtains significant improvement over the baselines. In the case of taking 1.6 s of each speaker for testing, SpeakerGAN achieves the identification accuracy of 98.20%, which suggests the promise for short-utterance speaker identification. (C) 2020 Elsevier B.V. All rights reserved.
What problem does this paper attempt to address?