A Prior-knowledge Guided Multi-scale Deep Network for Speech Emotion Recognition

Jun He,Yangcai Zhong,Penghao Rao,Bo Sun,Yinghui Zhang
DOI: https://doi.org/10.1145/3512353.3512380
2022-01-01
Abstract:Speech emotion recognition (SER) is a challenging task, whose performance always depends on the effectiveness of its features for classification. At present, many researchers use signal processing or data-driven deep learning methods to obtain expressive features. However, most of them are single-scale and only represent local speech information, and the methods cannot fully learn the underlying knowledge either. Therefore, we propose a multi-scale convolutional recurrent neural network with attention mechanism (AMCRNN) for multi-scale features for more comprehensive expression. What's more, we introduce prior-knowledge to guide our model discriminant learning. The proposed model is evaluated over two datasets including CHEAVD2.0 and IEMOCAP, and the results show that our method can achieve the comparable performance.
What problem does this paper attempt to address?