All rivers run into the sea: Unified Modality Brain-like Emotional Central Mechanism

Xinji Mai,Junxiong Lin,Haoran Wang,Zeng Tao,Yan Wang,Shaoqi Yan,Xuan Tong,Jiawen Yu,Boyang Wang,Ziheng Zhou,Qing Zhao,Shuyong Gao,Wenqiang Zhang
2024-07-22
Abstract:In the field of affective computing, fully leveraging information from a variety of sensory modalities is essential for the comprehensive understanding and processing of human emotions. Inspired by the process through which the human brain handles emotions and the theory of cross-modal plasticity, we propose UMBEnet, a brain-like unified modal affective processing network. The primary design of UMBEnet includes a Dual-Stream (DS) structure that fuses inherent prompts with a Prompt Pool and a Sparse Feature Fusion (SFF) module. The design of the Prompt Pool is aimed at integrating information from different modalities, while inherent prompts are intended to enhance the system's predictive guidance capabilities and effectively manage knowledge related to emotion classification. Moreover, considering the sparsity of effective information across different modalities, the SSF module aims to make full use of all available sensory data through the sparse integration of modality fusion prompts and inherent prompts, maintaining high adaptability and sensitivity to complex emotional states. Extensive experiments on the largest benchmark datasets in the Dynamic Facial Expression Recognition (DFER) field, including DFEW, FERV39k, and MAFW, have proven that UMBEnet consistently outperforms the current state-of-the-art methods. Notably, in scenarios of Modality Missingness and multimodal contexts, UMBEnet significantly surpasses the leading current methods, demonstrating outstanding performance and adaptability in tasks that involve complex emotional understanding with rich multimodal information.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to make full use of information from multiple sensory modalities in the field of affective computing, especially in the case of the absence of certain modalities, to achieve a comprehensive understanding and processing of human emotions. Specifically, the paper focuses on the following aspects: 1. **Multi - modal fusion**: In affective computing tasks, how to effectively fuse information from different sensory modalities (such as vision, language, and audition) to improve the accuracy and robustness of emotion recognition. 2. **Modality Missingness**: In real - world affective computing tasks, the information of certain modalities may be unavailable or missing. How to still maintain high emotion recognition performance in this situation? 3. **Cross - Modal Plasticity**: Inspired by the cross - modal plasticity of the human brain, when a certain sensory modality is missing, the processing ability of other modalities will be enhanced. The paper attempts to simulate this process to improve the adaptability and flexibility of the system. To solve the above problems, the paper proposes a brain - inspired unified modal emotion processing network named UMBEnet. The main designs of UMBEnet include: - **Dual - Stream (DS)**: It combines the Prompt Pool and Inherent Prompts. Through the fusion of multi - modal prompts in the Prompt Pool and Inherent Prompts, it realizes the comprehensive utilization of multi - modal information. - **Sparse Feature Fusion (SFF)**: Through sparse matrix fusion technology, it efficiently fuses multi - modal information and improves the robustness and accuracy of the system in complex emotion tasks. The paper has verified the effectiveness of UMBEnet in the field of Dynamic Facial Expression Recognition (DFER) through a large number of experiments. Especially in the situations of modality missingness and multi - modality, UMBEnet significantly outperforms the existing state - of - the - art methods.