SER_AMPEL: a multi-source dataset for speech emotion recognition of Italian older adults

Alessandra Grossi,Francesca Gasparini
2023-12-14
Abstract:In this paper, SER_AMPEL, a multi-source dataset for speech emotion recognition (SER) is presented. The peculiarity of the dataset is that it is collected with the aim of providing a reference for speech emotion recognition in case of Italian older adults. The dataset is collected following different protocols, in particular considering acted conversations, extracted from movies and TV series, and recording natural conversations where the emotions are elicited by proper questions. The evidence of the need for such a dataset emerges from the analysis of the state of the art. Preliminary considerations on the critical issues of SER are reported analyzing the classification results on a subset of the proposed dataset.
Audio and Speech Processing,Computation and Language,Sound
What problem does this paper attempt to address?
The paper mainly aims to address the following issues: 1. **The need for emotion recognition in the elderly**: With the growth of the aging population and the increased societal focus on the well-being of the elderly, reducing feelings of loneliness and social isolation has become an important topic. Utilizing technological means, especially social robots, can mitigate these negative effects, and emotion recognition plays a key role in this process. 2. **Limitations of existing datasets**: Current datasets used for Speech Emotion Recognition (SER) have several issues: - Most datasets consist of dialogues simulated by actors in different emotional states, rather than natural dialogues; - The few datasets that include natural dialogues usually do not encompass the elderly population; - Datasets are predominantly in English, with insufficient support for other languages; - There is a lack of research and datasets specifically for speech emotion recognition in the elderly. 3. **Development of the SER_AMPEL dataset**: To overcome the above limitations, the authors developed the SER_AMPEL dataset, a multi-source dataset aimed at providing a reference for emotion recognition in Italian elderly. The dataset includes three subsets: NOLD (elderly people in natural dialogues elicited through questions), NYNG (natural dialogues of young people obtained in a similar manner), and AOLD (dialogue segments of elderly people extracted from movies and TV series). These datasets take into account different age groups, different dialogue collection methods, and the degree of naturalness. 4. **Model training and evaluation**: A basic emotion recognition model was built based on XGBoost, and attempts were made to improve the model's performance across different languages and age groups through domain adaptation strategies (such as KLIEP and TrAdaBoost). Experimental results showed that the model performed poorly on the AOLD dataset of Italian elderly dialogues, which may be related to differences in the language of the training data and the type of dialogues. In summary, this paper aims to address the inadequacies in the applicability and diversity of existing datasets for the elderly population by establishing a speech emotion recognition dataset specifically designed for Italian elderly, thereby promoting research progress and technological applications in this field.