Abstract:Multimodal Relation Extraction (MRE) is an entity relationship extraction method based on multimodal information. Most existing MRE methods have two issues: 1) Weak cross-modal correlation and poor semantic consistency. 2) They do not achieve text-guided fusion of different modalities, resulting in excessive introduction of image noise. To address these issues, we propose an innovative MRE method inspired by genetics-A Comprehensive Genetic-Inspired For Multimodal Relation Extraction (CGI-MRE). It consists of two main modules: Gene Extraction And Recombination Module (GERM) and Text-Guided Fusion Module (TGFM). In the GERM module, we regard the text features and visual features as a feature body respectively, and decompose each feature body into common sub-features and unique sub-features. For these sub-features, we designed a Common Gene Extraction Mechanism (CGEM) to extract common advantageous genes in different modalities, a Unique Gene Extraction Mechanism (UGEM) to extract unique advantageous genes in each modality, and we finally use a Gene Recombination Mechanism (GRM) to obtain recombinant features that highly correlated with different modalities and have strong semantic consistency. In TGFM module, we organically fuse and extract the features in the recombined features that are beneficial to MRE. We use gate to adjust the text-guided original attention score and pooling attention score to obtain the text-guided saliency attention score. We can use this score to strictly extract information that is text-guided and beneficial to MRE from the image recombinant feature. Experimental results on the MNRE dataset show that our model outperforms the state-of-the-art performance and achieves F1-score of 84.62%.

CGI-MRE: A Comprehensive Genetic-Inspired Model For Multimodal Relation Extraction

Caption-Aware Multimodal Relation Extraction with Mutual Information Maximization

Dual-Gated Fusion with Prefix-Tuning for Multi-Modal Relation Extraction

Information Screening whilst Exploiting! Multimodal Relation Extraction with Feature Denoising and Multimodal Topic Modeling

Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis

Exploiting Visual Relation and Multi-Grained Knowledge for Multimodal Relation Extraction

Watch and Read! A Visual Relation-Aware and Textual Evidence Enhanced Model for Multimodal Relation Extraction

Learning from Different Text-Image Pairs: A Relation-enhanced Graph Convolutional Network for Multimodal NER

Joint Multimodal Entity-Relation Extraction Based on Edge-enhanced Graph Alignment Network and Word-pair Relation Tagging

Named Entity and Relation Extraction with Multi-Modal Retrieval

Multimodal Named Entity Recognition and Relation Extraction with Retrieval-Augmented Strategy

Focus & Gating: A Multimodal Approach for Unveiling Relations in Noisy Social Media

Using Augmented Small Multimodal Models to Guide Large Language Models for Multimodal Relation Extraction

Multimodal Relation Extraction via a Mixture of Hierarchical Visual Context Learners

Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction

Training on Synthetic Data Beats Real Data in Multimodal Relation Extraction

Visual Clue Guidance and Consistency Matching Framework for Multimodal Named Entity Recognition

MGIMM: Multi-Granularity Instruction Multimodal Model for Attribute-Guided Remote Sensing Image Detailed Description

Relation Extraction with Knowledge-Enhanced Prompt-Tuning on Multimodal Knowledge Graph

Visual Description Augmented Integration Network for Multimodal Entity and Relation Extraction