Abstract:Multimodal Relation Extraction (MRE) is an entity relationship extraction method based on multimodal information. Most existing MRE methods have two issues: 1) Weak cross-modal correlation and poor semantic consistency. 2) They do not achieve text-guided fusion of different modalities, resulting in excessive introduction of image noise. To address these issues, we propose an innovative MRE method inspired by genetics-A Comprehensive Genetic-Inspired For Multimodal Relation Extraction (CGI-MRE). It consists of two main modules: Gene Extraction And Recombination Module (GERM) and Text-Guided Fusion Module (TGFM). In the GERM module, we regard the text features and visual features as a feature body respectively, and decompose each feature body into common sub-features and unique sub-features. For these sub-features, we designed a Common Gene Extraction Mechanism (CGEM) to extract common advantageous genes in different modalities, a Unique Gene Extraction Mechanism (UGEM) to extract unique advantageous genes in each modality, and we finally use a Gene Recombination Mechanism (GRM) to obtain recombinant features that highly correlated with different modalities and have strong semantic consistency. In TGFM module, we organically fuse and extract the features in the recombined features that are beneficial to MRE. We use gate to adjust the text-guided original attention score and pooling attention score to obtain the text-guided saliency attention score. We can use this score to strictly extract information that is text-guided and beneficial to MRE from the image recombinant feature. Experimental results on the MNRE dataset show that our model outperforms the state-of-the-art performance and achieves F1-score of 84.62%.

Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis

Multimodal Named Entity Recognition and Relation Extraction with Retrieval-Augmented Strategy

Exploiting Visual Relation and Multi-Grained Knowledge for Multimodal Relation Extraction

Named Entity and Relation Extraction with Multi-Modal Retrieval

Watch and Read! A Visual Relation-Aware and Textual Evidence Enhanced Model for Multimodal Relation Extraction

Multimodal Relation Extraction via a Mixture of Hierarchical Visual Context Learners

Towards Bridged Vision and Language: Learning Cross-modal Knowledge Representation for Relation Extraction

On Analyzing the Role of Image for Visual-Enhanced Relation Extraction (student Abstract).

Dual-Gated Fusion with Prefix-Tuning for Multi-Modal Relation Extraction

Caption-Aware Multimodal Relation Extraction with Mutual Information Maximization

Information Screening whilst Exploiting! Multimodal Relation Extraction with Feature Denoising and Multimodal Topic Modeling

CGI-MRE: A Comprehensive Genetic-Inspired Model For Multimodal Relation Extraction

Relation Extraction with Knowledge-Enhanced Prompt-Tuning on Multimodal Knowledge Graph

A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues

Multimodal Relational Triple Extraction with Query-based Entity Object Transformer

MORE: A Multimodal Object-Entity Relation Extraction Dataset with a Benchmark Evaluation

Joint Multimodal Entity-Relation Extraction Based on Edge-enhanced Graph Alignment Network and Word-pair Relation Tagging

Visual Relations Augmented Cross-modal Retrieval

Training on Synthetic Data Beats Real Data in Multimodal Relation Extraction

Multi-Modal Relational Graph for Cross-Modal Video Moment Retrieval