GFRN-SEA: Global-Aware Feature Representation Network for Speech Emotion Analysis

Lei Pan,Qi Wang
DOI: https://doi.org/10.1109/access.2024.3490186
IF: 3.9
2024-11-09
IEEE Access
Abstract:With the rapid advancement of artificial intelligence and machine learning, speech emotion recognition (SER) holds significant potential across various applications. Despite progress, challenges persist in effectively extracting and fusing deep features from original audio data. This paper presents a Global-Aware Feature Representation Network for Speech Emotion Analysis (GFRN-SEA), a novel SER model that integrates three distinct levels of audio features: Mel-Frequency Cepstral Coefficients (MFCC), spectrogram features, and HuBERT features. By processing the spectrogram through a ResNet encoder, MFCC through a BiLSTM encoder, and raw audio waveforms through the HuBERT model, we ensure a comprehensive capture of the temporal, spatial, and spatiotemporal features of the audio signal. To further refine the extracted features, GFRN-SEA introduces a Multi-dimensional and Multi-scale (MDMS) convolution module. This module applies convolutions at multiple scales, capturing fine-grained details and broader contextual information from spectrogram and MFCC features. Another key component of GFRN-SEA is the multi-layer cross attention (MLCA) mechanism. The MLCA effectively fuses extracted features by emphasizing the most relevant information across modalities and integrating spectrogram, MFCC, and HuBERT features to enhance synergy and improve emotion recognition performance. Finally, the fused features are input into a fully connected layer to classify the emotional state of the speaker. Our method leverages the complementary strengths of multi-dimension and multi-scale feature extraction and sophisticated fusion techniques, achieving state-of-the-art performance on four mainstream public datasets.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?