Abstract:Multimodal Relation Extraction (MRE) is an entity relationship extraction method based on multimodal information. Most existing MRE methods have two issues: 1) Weak cross-modal correlation and poor semantic consistency. 2) They do not achieve text-guided fusion of different modalities, resulting in excessive introduction of image noise. To address these issues, we propose an innovative MRE method inspired by genetics-A Comprehensive Genetic-Inspired For Multimodal Relation Extraction (CGI-MRE). It consists of two main modules: Gene Extraction And Recombination Module (GERM) and Text-Guided Fusion Module (TGFM). In the GERM module, we regard the text features and visual features as a feature body respectively, and decompose each feature body into common sub-features and unique sub-features. For these sub-features, we designed a Common Gene Extraction Mechanism (CGEM) to extract common advantageous genes in different modalities, a Unique Gene Extraction Mechanism (UGEM) to extract unique advantageous genes in each modality, and we finally use a Gene Recombination Mechanism (GRM) to obtain recombinant features that highly correlated with different modalities and have strong semantic consistency. In TGFM module, we organically fuse and extract the features in the recombined features that are beneficial to MRE. We use gate to adjust the text-guided original attention score and pooling attention score to obtain the text-guided saliency attention score. We can use this score to strictly extract information that is text-guided and beneficial to MRE from the image recombinant feature. Experimental results on the MNRE dataset show that our model outperforms the state-of-the-art performance and achieves F1-score of 84.62%.

Multimodal Named Entity Recognition and Relation Extraction with Retrieval-Augmented Strategy

Named Entity and Relation Extraction with Multi-Modal Retrieval

Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis

Information Screening whilst Exploiting! Multimodal Relation Extraction with Feature Denoising and Multimodal Topic Modeling

Exploiting Visual Relation and Multi-Grained Knowledge for Multimodal Relation Extraction

Joint Multimodal Entity-Relation Extraction Based on Edge-Enhanced Graph Alignment Network and Word-Pair Relation Tagging

Watch and Read! A Visual Relation-Aware and Textual Evidence Enhanced Model for Multimodal Relation Extraction

Learning from Different Text-Image Pairs: A Relation-enhanced Graph Convolutional Network for Multimodal NER

Using Augmented Small Multimodal Models to Guide Large Language Models for Multimodal Relation Extraction

Multimodal Relation Extraction via a Mixture of Hierarchical Visual Context Learners

CAT-MNER: Multimodal Named Entity Recognition with Knowledge-Refined Cross-Modal Attention

Caption-Aware Multimodal Relation Extraction with Mutual Information Maximization

Multimodal heterogeneous graph entity-level fusion for named entity recognition with multi-granularity visual guidance

Dual-Gated Fusion with Prefix-Tuning for Multi-Modal Relation Extraction

Visual Description Augmented Integration Network for Multimodal Entity and Relation Extraction

MNER-QG: An End-to-End MRC Framework for Multimodal Named Entity Recognition with Query Grounding

On Analyzing the Role of Image for Visual-Enhanced Relation Extraction (student Abstract).

CGI-MRE: A Comprehensive Genetic-Inspired Model For Multimodal Relation Extraction

Focus & Gating: A Multimodal Approach for Unveiling Relations in Noisy Social Media

MAF - A General Matching and Alignment Framework for Multimodal Named Entity Recognition.