Noise-label Suppressed Module for Speech Emotion Recognition.

Xingcan Liang,Linsen Xu,Zhipeng Liu,Xiang Sui,Jinfu Liu
DOI: https://doi.org/10.1145/3598151.3598176
2023-01-01
Abstract:Speech emotion recognition (SER) has become an attractive topic owing to its broad range of applications. Segmentation is often used to increase training data for SER, but the inherited label may result in low performance. In this paper, we proposed a robust noise-label-suppressed module by relabeling the segment label to suppress the bad effects of the inherited label. Firstly, the segment of the log Mel spectrogram with deltas and delta-deltas of speech was calculated. Then, speech features were extracted by feature extraction model with 3-D data. Finally, the labels of each segment were corrected by the relabel model. Experimental results on the IEMOCAP dataset illustrate that our proposed noise-label suppressed module is superior to other advanced methods and gets robust performance.
What problem does this paper attempt to address?