CM-TCN: Channel-Aware Multi-scale Temporal Convolutional Networks for Speech Emotion Recognition.

Tianqi Wu,Liejun Wang,Jiang Zhang
DOI: https://doi.org/10.1007/978-981-99-8067-3_34
2024-01-01
Abstract:Speech emotion recognition (SER) plays a crucial role in understanding user intent and improving human-computer interaction (HCI). Currently, the most widely used and effective methods are based on deep learning. In the existing research, the temporal information becomes more and more important in SER. Although some advanced deep learning methods can achieve good results, such as convolutional neural networks (CNN) and attention module, they often ignore the temporal information in speech, which can lead to insufficient representation and low classification accuracy. In order to make full use of temporal features, we proposed channel-aware multi-scale temporal convolutional networks (CM-TCN). Firstly, channel-aware temporal convolutional networks (CATCN) is used as the basic structure to extract multi-scale temporal features combining channel information. Then, global feature attention (GFA) captures the global information at different time scales and enhances the important information. Finally, we use the adaptive fusion module (AFM) to establish the overall dependency of different network layers and fuse features. We conduct extensive experiments on six dataset, and the experimental results demonstrate the superior performance of CM-TCN.
What problem does this paper attempt to address?