Abstract:Multimodal Relation Extraction (MRE) is an entity relationship extraction method based on multimodal information. Most existing MRE methods have two issues: 1) Weak cross-modal correlation and poor semantic consistency. 2) They do not achieve text-guided fusion of different modalities, resulting in excessive introduction of image noise. To address these issues, we propose an innovative MRE method inspired by genetics-A Comprehensive Genetic-Inspired For Multimodal Relation Extraction (CGI-MRE). It consists of two main modules: Gene Extraction And Recombination Module (GERM) and Text-Guided Fusion Module (TGFM). In the GERM module, we regard the text features and visual features as a feature body respectively, and decompose each feature body into common sub-features and unique sub-features. For these sub-features, we designed a Common Gene Extraction Mechanism (CGEM) to extract common advantageous genes in different modalities, a Unique Gene Extraction Mechanism (UGEM) to extract unique advantageous genes in each modality, and we finally use a Gene Recombination Mechanism (GRM) to obtain recombinant features that highly correlated with different modalities and have strong semantic consistency. In TGFM module, we organically fuse and extract the features in the recombined features that are beneficial to MRE. We use gate to adjust the text-guided original attention score and pooling attention score to obtain the text-guided saliency attention score. We can use this score to strictly extract information that is text-guided and beneficial to MRE from the image recombinant feature. Experimental results on the MNRE dataset show that our model outperforms the state-of-the-art performance and achieves F1-score of 84.62%.

Caption-Aware Multimodal Relation Extraction with Mutual Information Maximization

Exploiting Visual Relation and Multi-Grained Knowledge for Multimodal Relation Extraction

CGI-MRE: A Comprehensive Genetic-Inspired Model For Multimodal Relation Extraction

On Analyzing the Role of Image for Visual-Enhanced Relation Extraction (student Abstract).

Enhancing Multimodal Entity and Relation Extraction With Variational Information Bottleneck

Information Screening whilst Exploiting! Multimodal Relation Extraction with Feature Denoising and Multimodal Topic Modeling

Watch and Read! A Visual Relation-Aware and Textual Evidence Enhanced Model for Multimodal Relation Extraction

Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis

Rethinking Uncertainly Missing and Ambiguous Visual Modality in Multi-Modal Entity Alignment

Using Augmented Small Multimodal Models to Guide Large Language Models for Multimodal Relation Extraction

Joint Multimodal Entity-Relation Extraction Based on Edge-Enhanced Graph Alignment Network and Word-Pair Relation Tagging

EAMA : Entity-Aware Multimodal Alignment Based Approach for News Image Captioning

Visual Description Augmented Integration Network for Multimodal Entity and Relation Extraction

Named Entity and Relation Extraction with Multi-Modal Retrieval

On Analyzing the Role of Image for Visual-enhanced Relation Extraction

Multimodal Relation Extraction via a Mixture of Hierarchical Visual Context Learners

Dual-Gated Fusion with Prefix-Tuning for Multi-Modal Relation Extraction

Entity-Aware Multimodal Alignment Framework for News Image Captioning

MAF - A General Matching and Alignment Framework for Multimodal Named Entity Recognition.