Abstract:Using multimodal signals to solve the problem of emotion recognition is one of the emerging trends in affective computing. Several studies have utilized state of the art deep learning methods and combined physiological signals, such as the electrocardiogram (EEG), electroencephalogram (ECG), skin temperature, along with facial expressions, voice, posture to name a few, in order to classify emotions. Spiking neural networks (SNNs) represent the third generation of neural networks and employ biologically plausible models of neurons. SNNs have been shown to handle Spatio-temporal data, which is essentially the nature of the data encountered in emotion recognition problem, in an efficient manner. In this work, for the first time, we propose the application of SNNs in order to solve the emotion recognition problem with the multimodal dataset. Specifically, we use the NeuCube framework, which employs an evolving SNN architecture to classify emotional valence and evaluate the performance of our approach on the MAHNOB-HCI dataset. The multimodal data used in our work consists of facial expressions along with physiological signals such as ECG, skin temperature, skin conductance, respiration signal, mouth length, and pupil size. We perform classification under the Leave-One-Subject-Out (LOSO) cross-validation mode. Our results show that the proposed approach achieves an accuracy of 73.15% for classifying binary valence when applying feature-level fusion, which is comparable to other deep learning methods. We achieve this accuracy even without using EEG, which other deep learning methods have relied on to achieve this level of accuracy. In conclusion, we have demonstrated that the SNN can be successfully used for solving the emotion recognition problem with multimodal data and also provide directions for future research utilizing SNN for Affective computing. In addition to the good accuracy, the SNN recognition system is requires incrementally trainable on new data in an adaptive way. It only one pass training, which makes it suitable for practical and on-line applications. These features are not manifested in other methods for this problem.

CMCI: A Robust Multimodal Fusion Method for Spiking Neural Networks

Event-Based Multimodal Spiking Neural Network with Attention Mechanism

Investigating Multisensory Integration in Emotion Recognition Through Bio-Inspired Computational Models

Neural Dependency Coding inspired Multimodal Fusion

Digit Recognition using Multimodal Spiking Neural Networks

CSNN: an Augmented Spiking Based Framework with Perceptron-Inception

MMTM: Multimodal Transfer Module for CNN Fusion

MSAF: Multimodal Split Attention Fusion

Coupled Mamba: Enhanced Multi-modal Fusion with Coupled State Space Model

Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis

FusionSense: Emotion Classification Using Feature Fusion of Multimodal Data and Deep Learning in a Brain-Inspired Spiking Neural Network

Enhancing SNN-based Spatio-Temporal Learning: A Benchmark Dataset and Cross-Modality Attention Model

Multimodal Sentiment Analysis Using Multi-tensor Fusion Network with Cross-modal Modeling

Multimodal Language Analysis with Recurrent Multistage Fusion

Multimodal Sentiment Analysis in Realistic Environments Based on Cross-Modal Hierarchical Fusion Network

Research on cross-modal emotion recognition based on multi-layer semantic fusion

Multimodal Multi-loss Fusion Network for Sentiment Analysis

MultiModN- Multimodal, Multi-Task, Interpretable Modular Networks

Deep Multimodal Data Fusion

Bioinspired multisensory neural network with crossmodal integration and recognition

MAVEN: A Memory Augmented Recurrent Approach for Multimodal Fusion