Abstract:Cross-corpus speech emotion recognition (SER) poses a challenge due to feature distribution mismatch, potentially degrading the performance of established SER methods. In this paper, we tackle this challenge by proposing a novel transfer subspace learning method called acoustic knowledgeguided transfer linear regression (AKTLR). Unlike existing approaches, which often overlook domain-specific knowledge related to SER and simply treat cross-corpus SER as a generic transfer learning task, our AKTLR method is built upon a well-designed acoustic knowledge-guided dual sparsity constraint mechanism. This mechanism emphasizes the potential of minimalistic acoustic parameter feature sets to alleviate classifier overadaptation, which is empirically validated acoustic knowledge in SER, enabling superior generalization in cross-corpus SER tasks compared to using large feature sets. Through this mechanism, we extend a simple transfer linear regression model to AKTLR. This extension harnesses its full capability to seek emotiondiscriminative and corpus-invariant features from established acoustic parameter feature sets used for describing speech signals across two scales: contributive acoustic parameter groups and constituent elements within each contributive group. Our proposed method is evaluated through extensive cross-corpus SER experiments on three widely-used speech emotion corpora: EmoDB, eNTERFACE, and CASIA. The results confirm the effectiveness and superior performance of our method, outperforming recent state-of-the-art transfer subspace learning and deep transfer learning-based cross-corpus SER methods. Furthermore, our work provides experimental evidence supporting the feasibility and superiority of incorporating domain-specific knowledge into the transfer learning model to address cross-corpus SER tasks.

Latent sparse transfer subspace learning for cross-corpus facial expression recognition

Learning Transferable Sparse Representations for Cross-corpus Facial Expression Recognition

Learning a Locality Preserving Subspace for Visual Recognition.

Two-stage nonnegative sparse representation for large-scale face recognition.

Transfer subspace learning for cross-dataset facial expression recognition

Local Spatial Continuity Steered Sparse Representation for Occluded Face Recognition

Cross-corpus Speech Emotion Recognition Based on Joint Transfer Subspace Learning and Regression

Nonnegative Matrix Factorization Based Transfer Subspace Learning for Cross-Corpus Speech Emotion Recognition

Subspace learning for facial expression recognition: an overview and a new perspective

Towards Reading Beyond Faces for Sparsity-Aware 4D Affect Recognition

Cross-Dataset Transfer Driver Expression Recognition via Global Discriminative and Local Structure Knowledge Exploitation in Shared Projection Subspace

Differentiating spontaneous from posed facial expressions within a generic facial expression recognition framework

Towards Reading Beyond Faces for Sparsity-aware 3D/4D Affect Recognition

A General Framework for Transfer Sparse Subspace Learning

Joint Structured Sparsity Regularized Multiview Dimension Reduction for Video-Based Facial Expression Recognition.

A Comprehensive Empirical Study on Linear Subspace Methods for Facial Expression Analysis

Deep Margin-Sensitive Representation Learning for Cross-Domain Facial Expression Recognition

Learning a Spatially Smooth Subspace for Face Recognition

Towards Domain-Specific Cross-Corpus Speech Emotion Recognition Approach

Transfer Subspace Learning for Unsupervised Cross-Corpus Speech Emotion Recognition

LSDT: Latent Sparse Domain Transfer Learning for Visual Adaptation