Abstract:In this study, we aim to determine if generalized sounds and music can share a common emotional space, improving predictions of emotion in terms of arousal and valence. We propose the use of multiple datasets as a multi-domain learning technique. Our approach involves creating a common space encompassing features that characterize both generalized sounds and music, as they can evoke emotions in a similar manner. To achieve this, we utilized two publicly available datasets, namely IADS-E and PMEmo, following a standardized experimental protocol. We employed a wide variety of features that capture diverse aspects of the audio structure including key parameters of spectrum, energy, and voicing. Subsequently, we performed joint learning on the common feature space, leveraging heterogeneous model architectures. Interestingly, this synergistic scheme outperforms the state-of-the-art in both sound and music emotion prediction. The code enabling full replication of the presented experimental pipeline is available at <a class="link-external link-https" href="https://github.com/LIMUNIMI/MusicSoundEmotions" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **To determine whether generalized sounds and music can share a common emotional space in order to improve the accuracy of emotion prediction, especially in the arousal and valence dimensions**. Specifically, the authors hope to verify the following hypotheses: 1. **Whether generalized sounds and music can be modeled in a common emotional space**: By creating a common space containing the features of generalized sounds and music, so that both can jointly train the model. 2. **Whether this multi - domain learning method can improve the performance of emotion prediction**: Especially for the prediction of the two emotion dimensions of arousal and valence, whether it can surpass the existing methods. To achieve this goal, the authors used multiple datasets (such as IADS - E and PMEmo), and carried out joint learning using the audio features in these datasets. They extracted a variety of audio features including spectrum, energy and pronunciation, and used different model architectures (such as linear models, support vector regression and support vector machines, etc.) for experiments. Eventually, they found that this method performs excellently in emotion prediction, especially achieving a significant improvement in arousal prediction. ### Key Contributions 1. **Proposed a new multi - modal learning strategy**, which combines different types of audio data (music and generalized sounds) for audio emotion recognition (AER). 2. **Developed new models**, which surpass the existing techniques in the emotion recognition of music and environmental sounds. 3. **Analyzed in detail the impact of the proposed enhancement strategy on different types of sounds**, showing its potential in practical applications. ### Method Overview - **Datasets**: Used two datasets, IADS - E and PMEmo, which cover generalized sounds and music respectively. - **Feature Extraction**: Used the openSMILE toolkit to extract 6375 static features, covering multiple aspects such as spectrum, energy and pronunciation. - **Model Selection and Validation**: Used multiple models such as ElasticNet, support vector regression (SVR) and AutoML, and evaluated the model performance through 5 - fold cross - validation. Through these methods, the authors have successfully proven that generalized sounds and music can be effectively modeled in a common emotional space, and this multi - domain learning method can significantly improve the accuracy of emotion prediction.

Joint Learning of Emotions in Music and Generalized Sounds

Learning Music Emotion Primitives via Supervised Dynamic Clustering.

Image–Music Synesthesia-Aware Learning Based on Emotional Similarity Recognition

A Efficient Multimodal Framework for Large Scale Emotion Recognition by Fusing Music and Electrodermal Activity Signals

Using Psychophysiologicalmeasures to Recognize Personalmusic Emotional Experience

Human-centric Music Medical Therapy Exploration System

A New Multilabel System for Automatic Music Emotion Recognition

Emotion-Aligned Contrastive Learning Between Images and Music

Study on Linguistic Computing for Music Emotion

Symbolic & Acoustic: Multi-domain Music Emotion Modeling for Instrumental Music

Bridging Paintings and Music -- Exploring Emotion based Music Generation through Paintings

Learning Affective Correspondence between Music and Image

A Deep Bidirectional Long Short-Term Memory Based Multi-Scale Approach for Music Dynamic Emotion Prediction

Real-Time Human-Music Emotional Interaction Based on Deep Learning and Multimodal Sentiment Analysis

Visual-Audio Emotion Recognition Based on Multi-Task and Ensemble Learning with Multiple Features

Enhancing Affective Representations of Music-Induced EEG through Multimodal Supervision and latent Domain Adaptation

Multi-Scale Approaches to the MediaEval 2015 "emotion in Music" Task.

A Multimodal Framework for Large-Scale Emotion Recognition by Fusing Music and Electrodermal Activity Signals

Real-time Human-Music Emotional Interaction Based on Multimodal Analysis

A Comparison Study of Deep Learning Methodologies for Music Emotion Recognition

Are We There Yet? A Brief Survey of Music Emotion Prediction Datasets, Models and Outstanding Challenges