Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech Emotion Recognition

Samuel Cahyawijaya,Holy Lovenia,Willy Chung,Rita Frieske,Zihan Liu,Pascale Fung
2023-06-26
Abstract:Speech emotion recognition plays a crucial role in human-computer interactions. However, most speech emotion recognition research is biased toward English-speaking adults, which hinders its applicability to other demographic groups in different languages and age groups. In this work, we analyze the transferability of emotion recognition across three different languages--English, Mandarin Chinese, and Cantonese; and 2 different age groups--adults and the elderly. To conduct the experiment, we develop an English-Mandarin speech emotion benchmark for adults and the elderly, BiMotion, and a Cantonese speech emotion dataset, YueMotion. This study concludes that different language and age groups require specific speech features, thus making cross-lingual inference an unsuitable method. However, cross-group data augmentation is still beneficial to regularize the model, with linguistic distance being a significant influence on cross-lingual transferability. We release publicly release our code at <a class="link-external link-https" href="https://github.com/HLTCHKUST/elderly_ser" rel="external noopener nofollow">this https URL</a>.
Computation and Language,Sound,Audio and Speech Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the accuracy of speech emotion recognition for the elderly in low - resource languages. Specifically, most existing speech emotion recognition research is biased towards English - speaking adult speakers, which limits its application among people of different languages and age groups. Therefore, the goal of this paper is to analyze the transferability of emotion recognition ability between different languages (English, Mandarin, Cantonese) and two age groups (adults and the elderly). To conduct the experiment, the author developed an English - Chinese bilingual speech emotion benchmark dataset (BiMotion) covering adults and the elderly, as well as a Cantonese speech emotion dataset (YueMotion). Through these datasets, researchers hope to understand the transfer of emotion recognition ability between different languages and age groups, especially how to use multilingual pre - trained speech models to improve the emotion recognition performance of the elderly in low - resource languages.