Implicitly Aligning Joint Distributions for Cross-Corpus Speech Emotion Recognition
Cheng Lu,Yuan Zong,Chuangao Tang,Hailun Lian,Hongli Chang,Jie Zhu,Sunan Li,Yan Zhao
DOI: https://doi.org/10.3390/electronics11172745
IF: 2.9
2022-09-01
Electronics
Abstract:In this paper, we investigate the problem of cross-corpus speech emotion recognition (SER), in which the training (source) and testing (target) speech samples belong to different corpora. This case thus leads to a feature distribution mismatch between the source and target speech samples. Hence, the performance of most existing SER methods drops sharply. To solve this problem, we propose a simple yet effective transfer subspace learning method called joint distribution implicitly aligned subspace learning (JIASL). The basic idea of JIASL is very straightforward, i.e., building an emotion discriminative and corpus invariant linear regression model under an implicit distribution alignment strategy. Following this idea, we first make use of the source speech features and emotion labels to endow such a regression model with emotion-discriminative ability. Then, a well-designed reconstruction regularization term, jointly considering the marginal and conditional distribution alignments between the speech samples in both corpora, is adopted to implicitly enable the regression model to predict the emotion labels of target speech samples. To evaluate the performance of our proposed JIASL, extensive cross-corpus SER experiments are carried out, and the results demonstrate the promising performance of the proposed JIASL in coping with the tasks of cross-corpus SER.
engineering, electrical & electronic,computer science, information systems,physics, applied