Abstract:Considerable attention has been paid to physiological signal-based emotion recognition in the field of affective computing. For reliability and user-friendly acquisition, electrodermal activity (EDA) has a great advantage in practical applications. However, EDA-based emotion recognition with large-scale subjects is still a tough problem. The traditional well-designed classifiers with hand-crafted features produce poorer results because of their limited representation abilities. And the deep learning models with auto feature extraction suffer the overfitting drop-off because of large-scale individual differences. Since music has a strong correlation with human emotion, static music can be involved as the external benchmark to constrain various dynamic EDA signals. In this article, we make an attempt by fusing the subject’s individual EDA features and the external evoked music features. And we propose an end-to-end multimodal framework, the one-dimensional residual temporal and channel attention network (RTCAN-1D). For EDA features, the channel-temporal attention mechanism for EDA-based emotion recognition is first involved in mine the temporal and channel-wise dynamic and steady features. The comparisons with single EDA-based SOTA models on DEAP and AMIGOS datasets prove the effectiveness of RTCAN-1D to mine EDA features. For music features, we simply process the music signal with the open-source toolkit openSMILE to obtain external feature vectors. We conducted systematic and extensive evaluations. The experiments on the current largest music emotion dataset PMEmo validate that the fusion of EDA and music is a reliable and efficient solution for large-scale emotion recognition.

Multi-Scale Approaches to the MediaEval 2015 "emotion in Music" Task.

A Efficient Multimodal Framework for Large Scale Emotion Recognition by Fusing Music and Electrodermal Activity Signals

A Deep Bidirectional Long Short-Term Memory Based Multi-Scale Approach for Music Dynamic Emotion Prediction

DBLSTM-based Multi-Scale Fusion for Dynamic Emotion Prediction in Music.

Multi-scale Context Based Attention for Dynamic Music Emotion Prediction

Svr Based Double-Scale Regression For Dynamic Emotion Prediction In Music

Using Psychophysiologicalmeasures to Recognize Personalmusic Emotional Experience

MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task Using Multi-level Regression.

A Multimodal Framework for Large-Scale Emotion Recognition by Fusing Music and Electrodermal Activity Signals

Visual-Audio Emotion Recognition Based on Multi-Task and Ensemble Learning with Multiple Features

Beatsens' Solution for MediaEval 2014 Emotion in Music Task.

ADFF: Attention Based Deep Feature Fusion Approach for Music Emotion Recognition

Music-induced emotion flow modeling by ENMI Network

Multimodal Music Emotion Recognition with Hierarchical Cross-Modal Attention Network

Symbolic & Acoustic: Multi-domain Music Emotion Modeling for Instrumental Music

Continuous Multimodal Emotion Prediction Based on Long Short Term Memory Recurrent Neural Network

Music emotion recognition based on temporal convolutional attention network using EEG

User-Adaptive Music Emotion Recognition

Multi-modal Continuous Dimensional Emotion Recognition Using Recurrent Neural Network and Self-Attention Mechanism

PKU-AIPL' Solution for MediaEval 2015 Emotion in Music Task.

Multi-scale Temporal Modeling for Dimensional Emotion Recognition in Video