Multi-layer maximum mean discrepancy in auto-encoders for cross-corpus speech emotion recognition

Babak Nasersharif,Manije Ebrahimpour,Navid Naderi
DOI: https://doi.org/10.1007/s11227-023-05161-y
IF: 3.3
2023-03-21
The Journal of Supercomputing
Abstract:Speech emotion recognition system performance degrades due to the mismatch between the training (source) and the test (target) corpora. Domain adaptation methods can be used to handle this problem. In this paper, we propose a deep domain adaptation method for ordinary and variational auto-encoders to extract domain-invariant features for cross-corpus speech emotion recognition. In this way, we consider an auto-encoder for each source and target domain dataset. Then, we propose to train auto-encoders using a domain adaptation loss along with the conventional loss. The domain adaptation loss is based on maximum mean discrepancy between layers of source and target auto-encoders to bring the distributions of target and source domain features closer and obtain a domain-invariant feature space. We report our results on several emotional speech datasets as the source and target datasets where we used the SVM as a classifier which is only trained on extracted source features. Experimental results show that the proposed domain-adapted auto-encoder and variational auto-encoder improve cross-corpus speech emotion recognition accuracy in comparison to unadapted auto-encoders and other related methods.
computer science, theory & methods,engineering, electrical & electronic, hardware & architecture
What problem does this paper attempt to address?