Relational Graph-Bridged Image-Text Interaction: A Novel Method for Multi-Modal Relation Extraction

Zihao Zheng,Tao He,Ming Liu,Zhongyuan Wang,Ruiji Fu,Bing Qin
DOI: https://doi.org/10.1109/icassp48485.2024.10448507
2024-01-01
Abstract:Multi-modal relation extraction (MRE) requires the integration of multi-modal information to identify relationships between entities. Although fine-grained correlations between visual objects and textual words have the potential to improve cross-modal interaction, they are typically modeled implicitly and hindered by the modality gap. This paper introduces a novel method called relational Graph-Bridged cross-modal InTeraction (GBIT). GBIT aims to model fine-grained cross-modal correlations into the interaction process explicitly. This is achieved by constructing a fine-grained cross-modal relational graph, which acts as a bridge for effective cross-modal interaction in multiple layers. Within GBIT, a gated interaction strategy and an adaptive integration module are proposed for irrelevance-filtered information exchange and final information collation. Through extensive experiments on the benchmark MRE, we demonstrate the superiority of our proposed method for MRE.
What problem does this paper attempt to address?