Efficient Channel Attention for Speech Emotion Recognition with Asymmetric Convolution Module

TianQi Wu,Liejun Wang,Shuli Cheng,Jiang Zhang
DOI: https://doi.org/10.1109/prml59573.2023.10348239
2023-01-01
Abstract:Speech emotion recognition (SER) is a challenging task due to the diversity and complexity of emotions. There are some unresolved issues, such as too much redundant information in the extracted features, the problems of hard-to-classify samples and category imbalance. To address these issues, we propose asymmetric convolution module (ACM) capture the spatial information of feature maps by improving CNNs structure and efficient channel attention (ECA) to establish the dependency relationship between channels. The combination of both extract more useful emotion information in the feature map and improve classification accuracy. In this paper, our proposed model achieves state-of-the-art results on IEMOCAP improvised data, with 70.99% and 89.78% unweighted accuracy (UA) on the full IEMOCAP and EMO-DB corpora.
What problem does this paper attempt to address?