Abstract:In this paper, SER_AMPEL, a multi-source dataset for speech emotion recognition (SER) is presented. The peculiarity of the dataset is that it is collected with the aim of providing a reference for speech emotion recognition in case of Italian older adults. The dataset is collected following different protocols, in particular considering acted conversations, extracted from movies and TV series, and recording natural conversations where the emotions are elicited by proper questions. The evidence of the need for such a dataset emerges from the analysis of the state of the art. Preliminary considerations on the critical issues of SER are reported analyzing the classification results on a subset of the proposed dataset.

What problem does this paper attempt to address?

The paper mainly aims to address the following issues: 1. **The need for emotion recognition in the elderly**: With the growth of the aging population and the increased societal focus on the well-being of the elderly, reducing feelings of loneliness and social isolation has become an important topic. Utilizing technological means, especially social robots, can mitigate these negative effects, and emotion recognition plays a key role in this process. 2. **Limitations of existing datasets**: Current datasets used for Speech Emotion Recognition (SER) have several issues: - Most datasets consist of dialogues simulated by actors in different emotional states, rather than natural dialogues; - The few datasets that include natural dialogues usually do not encompass the elderly population; - Datasets are predominantly in English, with insufficient support for other languages; - There is a lack of research and datasets specifically for speech emotion recognition in the elderly. 3. **Development of the SER_AMPEL dataset**: To overcome the above limitations, the authors developed the SER_AMPEL dataset, a multi-source dataset aimed at providing a reference for emotion recognition in Italian elderly. The dataset includes three subsets: NOLD (elderly people in natural dialogues elicited through questions), NYNG (natural dialogues of young people obtained in a similar manner), and AOLD (dialogue segments of elderly people extracted from movies and TV series). These datasets take into account different age groups, different dialogue collection methods, and the degree of naturalness. 4. **Model training and evaluation**: A basic emotion recognition model was built based on XGBoost, and attempts were made to improve the model's performance across different languages and age groups through domain adaptation strategies (such as KLIEP and TrAdaBoost). Experimental results showed that the model performed poorly on the AOLD dataset of Italian elderly dialogues, which may be related to differences in the language of the training data and the type of dialogues. In summary, this paper aims to address the inadequacies in the applicability and diversity of existing datasets for the elderly population by establishing a speech emotion recognition dataset specifically designed for Italian elderly, thereby promoting research progress and technological applications in this field.

SER_AMPEL: a multi-source dataset for speech emotion recognition of Italian older adults

Sentiment recognition of Italian elderly through domain adaptation on cross-corpus speech dataset

EMOVOME: A Dataset for Emotion Recognition in Spontaneous Real-Life Speech

Speaker-Independent Speech Emotion Recognition Based On Cnn-Blstm And Multiple Svms

Semi-supervised cross-lingual speech emotion recognition

Speech emotion recognition via graph-based representations

ERIT Lightweight Multimodal Dataset for Elderly Emotion Recognition and Multimodal Fusion Evaluation

EMO-SUPERB: An In-depth Look at Speech Emotion Recognition

Speech Emotion Recognition Based on Clustering Assistance

Enhancing speech emotion recognition through deep learning and handcrafted feature fusion

Speech Emotion Recognition Systems: A Comprehensive Review on Different Methodologies

Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation

A New Amharic Speech Emotion Dataset and Classification Benchmark

SELM: Enhancing Speech Emotion Recognition for Out-of-Domain Scenarios

What Does it Take to Generalize SER Model Across Datasets? A Comprehensive Benchmark

Embedded Emotions -- A Data Driven Approach to Learn Transferable Feature Representations from Raw Speech Input for Emotion Recognition

EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark

End-to-End Continuous Speech Emotion Recognition in Real-life Customer Service Call Center Conversations

Cross-corpus speech emotion recognition with transformers: Leveraging handcrafted features and data augmentation

SER Evals: In-domain and Out-of-domain Benchmarking for Speech Emotion Recognition

An Assessment of In-the-Wild Datasets for Multimodal Emotion Recognition