Abstract:Speaker recognition is often dependent on the speaker and susceptible to emotional factors, thus mostly decreasing recognition performance, we propose a framework by combining generative adversarial networks and speaker recognition for generating additional speaker-related emotional training speech feature to enhance robustness under different emotional conditions. In this framework, a new speaker emotion-converted generative adversarial network (SEC-GAN) is developed for speaker recognition. Given the neutral speech of the target speaker, SEC-GAN learns speech information to generate speech feature in other emotions based on neutral speech while retaining speaker identity. In addition, a new loss function is designed to retain the speaker's internal information during feature reconstruction, and an emotion discriminator is introduced to classify the speech feature's emotion for better emotion generation quality. Based on the origin neutral and generated training data from native speakers of Mandarin Affective Speech Corpus (MASC), the negative impact of emotion mismatch between speech can be decreased by using our framework. This strategy could solve the common problem in reality that most voice control devices enroll user's calm speech but fail to recognize user's identity when they are in other emotion. The experimental results on MASC which is 57.59% show the improvement of 8.27% compared with VGG baseline and 5.62% compared with x-vector in accuracy. Our framework also outperforms the existing state-of-the-art method ECAPA-TDNN and other comparison methods.

Speaker Normalization for Self-supervised Speech Emotion Recognition

Speaker Normalization for Self-supervised Speech Emotion Recognition

Learning Polynomial Function Based Neutral-Emotion Gmm Transformation For Emotional Speaker Recognition

Scores Selection for Emotional Speaker Recognition

Cost-Sensitive Learning for Emotion Robust Speaker Recognition

Emotion-State conversion for speaker recognition

Emotional speaker recognition based on similar neighbor phenomenon

Natural-Emotion Gmm Transformation Algorithm For Emotional Speaker Recognition

Emotional Speaker Recognition Based on Model Space Migration through Translated Learning.

Self-attention Transfer Networks for Speech Emotion Recognition

Speech Emotion Recognition Based on Syllable-Level Feature Extraction

Towards Adversarial Learning of Speaker-Invariant Representation for Speech Emotion Recognition

Real-time Speech Emotion Recognition Based on Syllable-Level Feature Extraction

Speaker Identification from emotional and noisy speech data using learned voice segregation and Speech VGG

Is Style All You Need? Dependencies Between Emotion and GST-based Speaker Recognition

SEC-GAN for robust speaker recognition with emotional state dismatch

An efficient algorithm for recognition of emotions from speaker and language independent speech using deep learning

Dataset-Distillation Generative Model for Speech Emotion Recognition

Deep Normalization for Speaker Vectors

Adaptive Data Boosting Technique for Robust Personalized Speech Emotion in Emotionally-Imbalanced Small-Sample Environments

Speaker Attentive Speech Emotion Recognition