Towards Discriminative Representations and Unbiased Predictions: Class-Specific Angular Softmax for Speech Emotion Recognition.

Zhixuan Li,Liang He,Jingyang Li,Li Wang,Wei-Qiang Zhang
DOI: https://doi.org/10.21437/interspeech.2019-1683
2019-01-01
Abstract:Speech emotion recognition (SER) is a challenging task: the complex emotional expressions make it difficult to discriminate different emotions; the unbalanced data misleads models to give biased predictions. In this work, we tackle these two problems by the angular softmax loss. First, we replace the vanilla softmax with angular softmax to learn emotional representations with strong discriminant power. Besides, inspired by its novel geometric interpretation, we establish a general calculation model and deduce a concise formula of decision domain. Based on these derivations, we propose our solution to data imbalance: class-specific angular softmax by which we can directly adjust decision domains of different emotion classes. Experimental results on the IEMOCAP corpus indicate significant improvements on two state-of-the-art models therefore demonstrate the effectiveness of our proposed methods.
What problem does this paper attempt to address?