Dense Graph Convolutional with Joint Cross-Attention Network for Multimodal Emotion Recognition

Cheng,Wenzhe Liu,Lin Feng,Ziyu Jia
DOI: https://doi.org/10.1109/tcss.2024.3412074
2024-01-01
Abstract:Multimodal emotion recognition (MER) has attracted much attention since it can leverage consistency and complementary relationships across multiple modalities. However, previous studies mostly focused on the complementary information of multimodal signals, neglecting the consistency information of multimodal signals and the topological structure of each modality. To this end, we propose a dense graph convolution network (DGC) equipped with a joint cross attention (JCA), named DG-JCA, for MER. The main advantage of the DG-JCA model is that it simultaneously integrates the spatial topology, consistency, and complementarity of multimodal data into a unified network framework. Meanwhile, DG-JCA extends the graph convolution network (GCN) via a dense connection strategy and introduces cross attention to joint model well-learned features from multiple modalities. Specifically, we first build a topology graph for each modality and then extract neighborhood features of different modalities using DGC driven by dense connections with multiple layers. Next, JCA performs cross-attention fusion in intra- and intermodality based on each modality’s characteristics while balancing the contributions of various modalities’ features. Finally, subject-dependent and subject-independent experiments on the DEAP and SEED-IV datasets are conducted to evaluate the proposed method. Abundant experimental results show that the proposed model can effectively extract and fuse multimodal features and achieve outstanding performance in comparison with some state-of-the-art approaches.
What problem does this paper attempt to address?