Abstract:Multimodal Relation Extraction (MRE) is an entity relationship extraction method based on multimodal information. Most existing MRE methods have two issues: 1) Weak cross-modal correlation and poor semantic consistency. 2) They do not achieve text-guided fusion of different modalities, resulting in excessive introduction of image noise. To address these issues, we propose an innovative MRE method inspired by genetics-A Comprehensive Genetic-Inspired For Multimodal Relation Extraction (CGI-MRE). It consists of two main modules: Gene Extraction And Recombination Module (GERM) and Text-Guided Fusion Module (TGFM). In the GERM module, we regard the text features and visual features as a feature body respectively, and decompose each feature body into common sub-features and unique sub-features. For these sub-features, we designed a Common Gene Extraction Mechanism (CGEM) to extract common advantageous genes in different modalities, a Unique Gene Extraction Mechanism (UGEM) to extract unique advantageous genes in each modality, and we finally use a Gene Recombination Mechanism (GRM) to obtain recombinant features that highly correlated with different modalities and have strong semantic consistency. In TGFM module, we organically fuse and extract the features in the recombined features that are beneficial to MRE. We use gate to adjust the text-guided original attention score and pooling attention score to obtain the text-guided saliency attention score. We can use this score to strictly extract information that is text-guided and beneficial to MRE from the image recombinant feature. Experimental results on the MNRE dataset show that our model outperforms the state-of-the-art performance and achieves F1-score of 84.62%.

Relational Graph-Bridged Image-Text Interaction: A Novel Method for Multi-Modal Relation Extraction

On Analyzing the Role of Image for Visual-Enhanced Relation Extraction (student Abstract).

Towards Bridged Vision and Language: Learning Cross-modal Knowledge Representation for Relation Extraction

Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis

Exploiting Visual Relation and Multi-Grained Knowledge for Multimodal Relation Extraction

Multimodal Relational Triple Extraction with Query-based Entity Object Transformer

Watch and Read! A Visual Relation-Aware and Textual Evidence Enhanced Model for Multimodal Relation Extraction

Dual-Gated Fusion with Prefix-Tuning for Multi-Modal Relation Extraction

CGI-MRE: A Comprehensive Genetic-Inspired Model For Multimodal Relation Extraction

Bridging Text and Knowledge with Multi-Prototype Embedding for Few-Shot Relational Triple Extraction.

Multi-Modal Relational Graph for Cross-Modal Video Moment Retrieval

Exploring Effective Inter-Encoder Semantic Interaction for Document-Level Relation Extraction

Focus & Gating: A Multimodal Approach for Unveiling Relations in Noisy Social Media

On Analyzing the Role of Image for Visual-enhanced Relation Extraction

Joint Multimodal Entity-Relation Extraction Based on Edge-enhanced Graph Alignment Network and Word-pair Relation Tagging

Relationship-Embedded Representation Learning for Grounding Referring Expressions

Named Entity and Relation Extraction with Multi-Modal Retrieval

Modeling Task Interactions in Document-Level Joint Entity and Relation Extraction

Relation Extraction with Knowledge-Enhanced Prompt-Tuning on Multimodal Knowledge Graph

Information Screening whilst Exploiting! Multimodal Relation Extraction with Feature Denoising and Multimodal Topic Modeling