Abstract:The shapes of speakers’ vocal organs change under their different emotional states, which leads to the deviation of the emotional acoustic space of short-time features from the neutral acoustic space and thereby the degradation of the speaker recognition performance. Features deviating greatly from the neutral acoustic space are considered as mismatched features, and they negatively affect speaker recognition systems. Emotion variation produces different feature deformations for different phonemes, so it is reasonable to build a finer model to detect mismatched features under each phoneme. However, given the difficulty of phoneme recognition, three sorts of acoustic class recognition—phoneme classes, Gaussian mixture model（GMM） tokenizer, and probabilistic GMM tokenizer—are proposed to replace phoneme recognition. We propose feature pruning and feature regulation methods to process the mismatched features to improve speaker recognition performance. As for the feature regulation method, a strategy of maximizing the between-class distance and minimizing the within-class distance is adopted to train the transformation matrix to regulate the mismatched features. Experiments conducted on the Mandarin affective speech corpus（MASC） show that our feature pruning and feature regulation methods increase the identification rate（IR） by 3.64% and 6.77%, compared with the baseline GMM-UBM（universal background model） algorithm. Also, corresponding IR increases of 2.09% and 3.32% can be obtained with our methods when applied to the state-of-the-art algorithm i-vector.

Affect-insensitive Speaker Recognition Systems Via Emotional Speech Clustering Using Prosodic Features

Emotional Speech Clustering Based Robust Speaker Recognition System

Cost-Sensitive Learning for Emotion Robust Speaker Recognition

Affect-Insensitive Speaker Recognition by Feature Variety Training

Emotion-State conversion for speaker recognition

Pitch envelope based frame level score reweighed algorithm for emotion robust speaker recognition.

Emotional speaker recognition based on similar neighbor phenomenon

Emotional Speaker Identification By Humans And Machines

Scores Selection for Emotional Speaker Recognition

Simplified Deformation Compensation for Emotional Speaker Recognition

Emotional Speaker Recognition Based on Model Space Migration through Translated Learning.

Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition

Applying difference detection and pruning to emotional speaker recognition

Natural-Emotion Gmm Transformation Algorithm For Emotional Speaker Recognition

Learning Polynomial Function Based Neutral-Emotion Gmm Transformation For Emotional Speaker Recognition

Toward emotional speaker recognition: framework and preliminary results

Emotional speaker recognition based on i-vector through Atom Aligned Sparse Representation

Reliability detection by Fuzzy SVM with UBM Component feature for emotional speaker recognition

Mismatched Feature Detection with Finer Granularity for Emotional Speaker Recognition.

Speech Emotion Recognition Based on Syllable-Level Feature Extraction

Real-time Speech Emotion Recognition Based on Syllable-Level Feature Extraction