Smile: Spiking Multi-Modal Interactive Label-Guided Enhancement Network for Emotion Recognition

Ming Guo,Wenrui Li,Chao Wang,Yuxin Ge,Chongjun Wang
DOI: https://doi.org/10.1109/icme57554.2024.10688152
2024-01-01
Abstract:Multi-modal multi-label emotion recognition has gained significant attention in the field of affective computing, enabling various signals to distinguish complex emotions accurately. However, previous studies primarily focus on capturing invariant representations, neglecting the importance of incorporating the fluctuation of temporal information which affects the model robustness. In this paper, we propose a novel Spiking Multi-modal Interactive Label-guided Enhancement network (SMILE). It introduces the spiking neural network with dynamic thresholds, allowing flexible processing of temporal information to enhance the model robustness. Furthermore, it employs the scale spiking fusion to enrich semantic information. In addition to modality-specific refinement, SMILE integrates the modality-interactive exploration and label-modality matching modules to capture multimodal interaction and label-modality dependence. Experimental results on benchmark datasets CMU-MOSEI and NEMu demonstrate the superiority of SMILE over state-ofthe-art models. Notably, SMILE achieves a significant 28.5% improvement in accuracy compared to the benchmark method when evaluated on NEMu dataset.
What problem does this paper attempt to address?