Abstract:This article focuses on the cross-corpus speech emotion recognition (SER) task. To overcome the problem that the distribution of training (source) samples is inconsistent with that of testing (target) samples, we propose a non-negative matrix factorization based transfer subspace learning method (NMFTSL). Our method tries to find a shared feature subspace for the source and target corpora, in which the discrepancy between the two distributions is eliminated as much as possible and their individual components are excluded, thus the knowledge of the source corpus can be transferred to the target corpus. Specifically, in this induced subspace, we minimize the distances not only between the marginal distributions but also between the conditional distributions, where both distances are measured by the maximum mean discrepancy criterion. To estimate the conditional distribution of the target corpus, we propose to integrate the prediction of target label and the learning of feature representation into a joint learning model. Meanwhile, we introduce a difference loss to exclude the individual components from the shared subspace, which can further reduce the mutual interference between the source and target individual components. Moreover, we propose a discrimination loss to introduce the labels into the shared subspace, which can improve the discrimination ability of the feature representation. We also provide the solution for the corresponding optimization problem. To evaluate the performance of our method, we construct 30 cross-corpus SER schemes using 6 popular speech emotion corpora. Experimental results show that our approach achieves better overall performance than state-of-the-art methods.

Learning Corpus-Invariant Discriminant Feature Representations for Speech Emotion Recognition.

Cross-corpus speech emotion recognition using transfer semi-supervised discriminant analysis

Transferable Discriminant Linear Regression for Cross-Corpus Speech Emotion Recognition

Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition

Speech Emotion Recognition Based on Linear Discriminant Analysis and Support Vector Machine Decision Tree

Feature Selection Based Transfer Subspace Learning for Speech Emotion Recognition

Domain-Invariant Feature Learning for Cross Corpus Speech Emotion Recognition

Transfer Sparse Discriminant Subspace Learning for Cross-Corpus Speech Emotion Recognition

Transfer Discriminant Regression for Cross-domain Speech Emotion Recognition

Speech Emotion Recognition Using Transfer Non-Negative Matrix Factorization

Cross-corpus Speech Emotion Recognition Based on a Feature Transfer Learning Method

Cross-corpus Speech Emotion Recognition Based on Transfer Non-Negative Matrix Factorization

A Novel Speech Emotion Recognition Method via Transfer PCA and Sparse Coding.

Transfer Semi-Supervised Non-Negative Matrix Factorization For Speech Emotion Recognition

Transfer Linear Subspace Learning for Cross-Corpus Speech Emotion Recognition.

A Generalized Subspace Distribution Adaptation Framework for Cross-Corpus Speech Emotion Recognition

Speech Emotion Recognition Based on Syllable-Level Feature Extraction

Joint Subspace Learning and Feature Selection Method for Speech Emotion Recognition

Speech Emotion Recognition Based on Feature Selection and Extreme Learning Machine Decision Tree

Deep Transductive Transfer Regression Network for Cross-Corpus Speech Emotion Recognition.

Nonnegative Matrix Factorization Based Transfer Subspace Learning for Cross-Corpus Speech Emotion Recognition