All rivers run into the sea: Unified Modality Brain-like Emotional Central Mechanism

Xinji Mai,Junxiong Lin,Haoran Wang,Zeng Tao,Yan Wang,Shaoqi Yan,Xuan Tong,Jiawen Yu,Boyang Wang,Ziheng Zhou,Qing Zhao,Shuyong Gao,Wenqiang Zhang

2024-07-22

Abstract:In the field of affective computing, fully leveraging information from a variety of sensory modalities is essential for the comprehensive understanding and processing of human emotions. Inspired by the process through which the human brain handles emotions and the theory of cross-modal plasticity, we propose UMBEnet, a brain-like unified modal affective processing network. The primary design of UMBEnet includes a Dual-Stream (DS) structure that fuses inherent prompts with a Prompt Pool and a Sparse Feature Fusion (SFF) module. The design of the Prompt Pool is aimed at integrating information from different modalities, while inherent prompts are intended to enhance the system's predictive guidance capabilities and effectively manage knowledge related to emotion classification. Moreover, considering the sparsity of effective information across different modalities, the SSF module aims to make full use of all available sensory data through the sparse integration of modality fusion prompts and inherent prompts, maintaining high adaptability and sensitivity to complex emotional states. Extensive experiments on the largest benchmark datasets in the Dynamic Facial Expression Recognition (DFER) field, including DFEW, FERV39k, and MAFW, have proven that UMBEnet consistently outperforms the current state-of-the-art methods. Notably, in scenarios of Modality Missingness and multimodal contexts, UMBEnet significantly surpasses the leading current methods, demonstrating outstanding performance and adaptability in tasks that involve complex emotional understanding with rich multimodal information.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to make full use of information from multiple sensory modalities in the field of affective computing, especially in the case of the absence of certain modalities, to achieve a comprehensive understanding and processing of human emotions. Specifically, the paper focuses on the following aspects: 1. **Multi - modal fusion**: In affective computing tasks, how to effectively fuse information from different sensory modalities (such as vision, language, and audition) to improve the accuracy and robustness of emotion recognition. 2. **Modality Missingness**: In real - world affective computing tasks, the information of certain modalities may be unavailable or missing. How to still maintain high emotion recognition performance in this situation? 3. **Cross - Modal Plasticity**: Inspired by the cross - modal plasticity of the human brain, when a certain sensory modality is missing, the processing ability of other modalities will be enhanced. The paper attempts to simulate this process to improve the adaptability and flexibility of the system. To solve the above problems, the paper proposes a brain - inspired unified modal emotion processing network named UMBEnet. The main designs of UMBEnet include: - **Dual - Stream (DS)**: It combines the Prompt Pool and Inherent Prompts. Through the fusion of multi - modal prompts in the Prompt Pool and Inherent Prompts, it realizes the comprehensive utilization of multi - modal information. - **Sparse Feature Fusion (SFF)**: Through sparse matrix fusion technology, it efficiently fuses multi - modal information and improves the robustness and accuracy of the system in complex emotion tasks. The paper has verified the effectiveness of UMBEnet in the field of Dynamic Facial Expression Recognition (DFER) through a large number of experiments. Especially in the situations of modality missingness and multi - modality, UMBEnet significantly outperforms the existing state - of - the - art methods.

All rivers run into the sea: Unified Modality Brain-like Emotional Central Mechanism

All Rivers Run into the Sea: Unified Modality Brain-Inspired Emotional Central Mechanism

A Efficient Multimodal Framework for Large Scale Emotion Recognition by Fusing Music and Electrodermal Activity Signals

MFDR: Multiple-stage Fusion and Dynamically Refined Network for Multimodal Emotion Recognition

Bridging the Emotional Semantic Gap via Multimodal Relevance Estimation

Emotion Recognition via Environmental Context and Human Body

MF-Net: a multimodal fusion network for emotion recognition based on multiple physiological signals

Temporal Convolutional Network-Enhanced Real-Time Implicit Emotion Recognition with an Innovative Wearable fNIRS-EEG Dual-Modal System

Investigating Multisensory Integration in Emotion Recognition Through Bio-Inspired Computational Models

A multimodal shared network with a cross-modal distribution constraint for continuous emotion recognition

Deep Emotional Arousal Network for Multimodal Sentiment Analysis and Emotion Recognition

A Novel Emotion-Aware Method Based on the Fusion of Textual Description of Speech, Body Movements, and Facial Expressions

Multimodal Utterance-level Affect Analysis using Visual, Audio and Text Features

ARMNet: A Network for Image Dimensional Emotion Prediction Based on Affective Region Extraction and Multi-Channel Fusion

Hierarchical multimodal-fusion of physiological signals for emotion recognition with scenario adaption and contrastive alignment

Multimodal Emotion Recognition based on Facial Expressions, Speech, and EEG

Multi-modal fusion network with complementarity and importance for emotion recognition

UniMSE: Towards Unified Multimodal Sentiment Analysis and Emotion Recognition

EffMulti: Efficiently Modeling Complex Multimodal Interactions for Emotion Analysis

Emotion recognition based on brain-like multimodal hierarchical perception

Multiplex graph aggregation and feature refinement for unsupervised incomplete multimodal emotion recognition