Single- and Cross-Lingual Speech Emotion Recognition Based on WavLM Domain Emotion Embedding
Jichen Yang,Jiahao Liu,Kai Huang,Jiaqi Xia,Zhengyu Zhu,Han Zhang
DOI: https://doi.org/10.3390/electronics13071380
IF: 2.9
2024-04-06
Electronics
Abstract:Unlike previous approaches in speech emotion recognition (SER), which typically extract emotion embeddings from a trained classifier consisting of fully connected layers and training data without considering contextual information, this research introduces a novel approach. It integrates contextual information into the feature extraction process. The proposed approach is based on the WavLM representation and incorporates a contextual transform, along with fully connected layers, training data, and corresponding label information, to extract single-lingual WavLM domain emotion embeddings (SL-WDEEs) and cross-lingual WavLM domain emotion embeddings (CL-WDEEs) for single-lingual and cross-lingual SER, respectively. To extract CL-WDEEs, multi-task learning is employed to remove language information, marking it as the first work to extract emotion embeddings for cross-lingual SER. Experimental results on the IEMOCAP database demonstrate that the proposed SL-WDEE outperforms some commonly used features and known systems, while results on the ESD database indicate that the proposed CL-WDEE effectively recognizes cross-lingual emotions and outperforms many commonly used features.
engineering, electrical & electronic,computer science, information systems,physics, applied