Speech Emotion Recognition Based on Meta-Transfer Learning with Domain Adaption

Zhen -Tao Liu,Bao-Han Wu,Meng -Ting Han,Wei -Hua Cao,Min Wu
DOI: https://doi.org/10.1016/j.asoc.2023.110766
IF: 8.7
2023-01-01
Applied Soft Computing
Abstract:Deep learning often requires large amounts of labeled data to train the model, which is not always readily available in the field of speech emotion recognition (SER). Related research work on SER in few shot conditions has reported problem with overfifitting and domain transfer of training. In this study, a few-shot learning method based on meta-transfer learning with domain adaption (MTLDA) is proposed for SER. It not only effectively reduces the over-fitting phenomenon of deep neural networks (DNN) trained with a small number of samples, but also solves the forgetting problem in meta-learning and the target domain adaptability problem in transfer learning. Experiments on three databases (i.e., CASIA is used for pre-training, Emo-DB and SAVEE are used for few-shot learning) are performed for few-shot learning of SER, from which the WAR is 65.12% and UAR is 64.50% on Emo-DB, and the WAR is 58.84% and UAR is 53.26% on SAVEE.
What problem does this paper attempt to address?