Speech Relationship Learning for Cross-Corpus Speech Emotion Recognition.

Yinru He,Guihua Wen,Pei Yang,Dongliang Chen
DOI: https://doi.org/10.1109/ICASSP48485.2024.10446440
2024-01-01
Abstract:Cross-Corpus Speech Emotion Recognition (SER) aims to identify human emotions from speech across different speakers and languages. Previous work engaged in extracting the domain-invariant features among individual samples that are most relevant to emotions, ignoring rich relationships between speech instances, which are also significant factors that strongly influence the sentiments. To explore those potential relationships across multiple corpora, we introduce a novel cross-corpus SER architecture with speech relationship learning. Specifically, during training, we employ the attention mechanism on the entire input batch, embedding the sample-level similar features in emotion space into new representations. Furthermore, a dual discriminator structure is proposed for improving the similarity calculation performance through adversarial training, and a domain-wise shared classifier with batch label smoothing strategy is proposed to enhance the network generalization ability. Experiments on the CASIA, EMODB and SAVEE datasets have demonstrated that the proposed method outperforms the state-of-the-art cross-corpus SER methods.
What problem does this paper attempt to address?