A Joint Network Based on Interactive Attention for Speech Emotion Recognition

Ying Hu,Shijing Hou,Huamin Yang,Hao Huang,Liang He
DOI: https://doi.org/10.1109/icme55011.2023.00295
2023-01-01
Abstract:Speech emotion recognition (SER) has played a vital role in human-machine interaction. In this paper, we propose a separate spectrum-based SER model and a joint network combining pre-trained and spectrum-based models. In the joint network, we design an interactive attention module to effectively fuse the intermediate features from two models. Our proposed separate spectrum-based model is superior to four compared spectrum-based methods under the speaker-dependent setting. For the application in real scenarios, we compared our proposed joint network with six methods utilizing the pre-trained model under the speaker-independent setting. Experimental results show that our proposed joint network achieves the best performance among four unimodal models on the unweighted accuracy (UA) of 73.32 % and weighted accuracy (WA) of 72.48 %, respectively.
What problem does this paper attempt to address?