Abstract:Unsupervised cross-corpus speech emotion recognition (SER) is the task where the labeled training (source) and unlabeled testing (target) speech come from different corpora. Subspace transfer learning is one of the mainstream technologies for tackling cross-corpus SER challenges, which is achieved by utilizing projection matrices to learn common corpus-invariant feature representations between the source and target corpus. However, these methods mainly focus on mapping modeling from the input low-level descriptor (LLD) feature space to the corpus-invariant emotional feature space, which is an implicit feature mapping process that lacks interpretability for the selected features. This omission leads to an inability to pinpoint which acoustic features possess corpus invariance. To bridge this gap, we first propose a new transfer subspace learning framework with feature selection capabilities, i.e., the Corpus-Invariant Emotional Acoustic Feature Seeker (CAFS). Specifically, the CAFS integrates two core terms into the transfer regression loss function: (1) the emotion preservation term: This term includes emotional regression and the l2,1 norm, which is mainly used to select features and ensure that these features are related to emotions. (2) the corpus invariance preservation term: This item is mainly used to measure the difference in feature distribution between the source and target corpora. Minimizing this term bridges the gap between the source and target domains, ensuring that the chosen acoustic features are corpus-invariant. Subsequently, we conducted extensive cross-corpus SER experiments to explore corpus-invariant emotional acoustic features under various commonly used acoustic feature sets (IS09 and eGeMAPS). Through statistical analysis of the acoustic features sought by the CAFS framework, some acoustic features (e.g., Mel-Frequency Cepstral Coefficients (MFCC) and Formant) reveal their corpus-invariant properties, which could provide insights for feature selection in cross-corpus SER. These findings also lay the groundwork for a solid theoretical and empirical foundation for future research and applications in cross-corpus SER.

Feature Selection Based Transfer Subspace Learning for Speech Emotion Recognition

Transfer Linear Subspace Learning for Cross-Corpus Speech Emotion Recognition.

Nonnegative Matrix Factorization Based Transfer Subspace Learning for Cross-Corpus Speech Emotion Recognition

Joint Subspace Learning and Feature Selection Method for Speech Emotion Recognition

Transfer Sparse Discriminant Subspace Learning for Cross-Corpus Speech Emotion Recognition

Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition

A Generalized Subspace Distribution Adaptation Framework for Cross-Corpus Speech Emotion Recognition

Exploring Corpus-Invariant Emotional Acoustic Feature for Cross-Corpus Speech Emotion Recognition

Target-Adapted Subspace Learning for Cross-Corpus Speech Emotion Recognition.

Joint Instance Reconstruction and Feature Subspace Alignment for Cross-Domain Speech Emotion Recognition

Cross-corpus Speech Emotion Recognition Based on Joint Transfer Subspace Learning and Regression

Transfer Subspace Learning for Unsupervised Cross-Corpus Speech Emotion Recognition

Cross-corpus speech emotion recognition using transfer semi-supervised discriminant analysis

Towards Domain-Specific Cross-Corpus Speech Emotion Recognition Approach

DSTL: Solution to Limitation of Small Corpus in Speech Emotion Recognition

Cross-corpus Speech Emotion Recognition Using Subspace Learning and Domain Adaption

Cross-corpus Speech Emotion Recognition Based on a Feature Transfer Learning Method

Learning Corpus-Invariant Discriminant Feature Representations for Speech Emotion Recognition.

Speech Emotion Recognition Based On Sparse Transfer Learning Method

Study on Feature Subspace of Archetypal Emotions for Speech Emotion Recognition

Latent sparse transfer subspace learning for cross-corpus facial expression recognition