Cross-corpus Speech Emotion Recognition Based on Joint Transfer Subspace Learning and Regression
Weijian Zhang,Peng Song,Dongliang Chen,Chao Sheng,Wenjing Zhang
DOI: https://doi.org/10.1109/tcds.2021.3055524
IF: 4.546
2021-01-01
IEEE Transactions on Cognitive and Developmental Systems
Abstract:Speech emotion recognition has become an attractive research topic due to various emotional states of speech signals in real-life scenarios. Most current speech emotion recognition methods are carried out on a single corpus. However, in practice, the training and testing data often come from different domains, e.g., different corpora. In this case, the model generalizability and recognition performance would decrease greatly due to the domain mismatch. To address this challenging problem, we present a transfer learning method, called joint transfer subspace learning and regression (JTSLR), for cross-corpus speech emotion recognition. Specifically, JTSLR performs transfer subspace learning and regression in a joint framework. First, we learn a latent subspace by introducing a discriminative maximum mean discrepancy (MMD) as the discrepancy metric. Then, we put forward a regression function in this latent subspace to describe the relationships between features and corresponding labels. Moreover, we present a label graph to help transfer knowledge from relevant source data to target data. Finally, we conduct extensive experiments on three popular emotional data sets. The results show that our method can outperform traditional methods and some state-of-the-art transfer learning algorithms for cross-corpus speech emotion recognition tasks.
robotics,computer science, artificial intelligence,neurosciences