Scores Selection for Emotional Speaker Recognition

Zhenyu Shan,Yingchun Yang
DOI: https://doi.org/10.1007/978-3-642-01793-3_51
2009-01-01
Abstract:Emotion variability of the training and testing utterances is one of the largest challenges in speaker recognition. It is a common situation where training data is the neutral speech and testing data is the mixture of neutral and emotional speech. In this paper, we experimentally analyzed the performance of the GMM-based verification system with the utterances in this situation. It reveals that the verification performance improves as the emotion ratio decreases and the scores of neutral features against his/her model are distributed in the upper area than other three scores(neutral against the model of other speakers, and non-neutral speech against the model of himself/herself and other speakers). Based on these, we propose a scores selection method to reduce the emotion ratio of the testing utterance by eliminating the non-neutral features. It is applicable to the GMM-based recognition system without labeling the emotion state in the testing process. The experiments are carried on the MASC Corpus and the performance of the system with scores selection is improved with an EER reduction from 13.52% to 10.17%.
What problem does this paper attempt to address?