Adversarial Domain Generalized Transformer for Cross-Corpus Speech Emotion Recognition

Yuan Gao,Longbiao Wang,Jiaxing Liu,Jianwu Dang,Shogo Okada
DOI: https://doi.org/10.1109/taffc.2023.3290795
IF: 13.99
2023-01-01
IEEE Transactions on Affective Computing
Abstract:Speech emotion recognition (SER) promotes the development of intelligent devices, which enable natural and friendly human-computer interactions. However, the recognition performance of existing approaches is significantly reduced on unseen datasets, and the lack of sufficient training data limits the generalizability of deep learning models. In this work, we analyze the impact of the domain generalization method on cross-corpus SER and propose an adversarial domain generalized transformer (ADoGT), which is aimed at learning a shared feature distribution for the source and target domains. Specifically, we investigate the effect of domain adversarial learning by eliminating nonaffective information. We also combine the center loss with the softmax function as joint supervision to learn discriminative features. Moreover, we introduce unsupervised transfer learning to extract additional features, and incorporate a gated fusion model to learn the complementary information of the features learned by the supervised feature extractor and pretrained model. The proposed transformer based domain generalization method is evaluated using four emotional datasets. We also provide an ablation study of different domain adversarial model structures and feature fusion models. The results of comparative experiments demonstrate the effectiveness of the proposed ADoGT.
computer science, cybernetics, artificial intelligence
What problem does this paper attempt to address?