Countering Modal Redundancy and Heterogeneity: A Self-Correcting Multimodal Fusion

Pengkun Wang,Xu Wang,Binwu Wang,Yudong Zhang,Lei Bai,Yang Wang
DOI: https://doi.org/10.1109/icdm54844.2022.00062
2022-01-01
Abstract:Fusing multimodal heterogeneous data plays a vital role in recognition and prediction tasks in various fields, e.g., action recognition and traffic accident forecast. Yet, there remain some key challenges, such as heterogeneous feature interaction and feature redundancies, that significantly affect the performance of multimodal fusion. To tackle these challenges, we first devise a Unified Feature Interaction Module (UFIM) in which a novel orthogonal attention component is designed to obtain fine-grained inter-modal interaction information among heterogeneous features. Then, we propose a novel Self-Correcting Transformer Module (SCTM) which employs a modified transformer to obtain the one-to-many correlation information between the current modal feature and the merged features of other modalities to alleviate the redundancy problem. Extensive experiments on four cross-domain tasks demonstrate the effectiveness and generalization ability of our proposed method.
What problem does this paper attempt to address?