Abstract:Entity alignment is used to determine whether entities from different sources refer to the same object in the real world. It is one of the key technologies for constructing large-scale knowledge graphs and is widely used in the fields of knowledge graphs and knowledge complementation. Because of the lack of semantic connection between the visual modality face attribute of the person entity and the text modality attribute and relationship information, it is difficult to model the visual and text modality into the same semantic space, and, as a result, that the traditional multimodal entity alignment method cannot be applied. In view of the scarcity of multimodal person relation graphs datasets and the difficulty of the multimodal semantic modeling of person entities, this paper analyzes and crawls open-source semi-structured data from different sources to build a multimodal person entity alignment dataset and focuses on using the facial and semantic information of multimodal person entities to improve the similarity of entity structural features which are modeled using the graph convolution layer and the dynamic graph attention layer to calculate the similarity. Through verification on the self-made multimodal person entity alignment dataset, the method proposed in this paper is compared with other entity alignment models which have a similar structure. Compared with AliNet, the probability that the first item in the candidate pre-aligned entity set is correct is increased by 12.4% and average ranking of correctly aligned entities in the candidate pre-aligned entity set decreased by 32.8, which proves the positive effect of integrating multimodal facial information, applying dynamic graph attention and a layer-wise gated network to improve the alignment effect of person entities.

Entity-Aware Multimodal Alignment Framework for News Image Captioning

EAMA : Entity-Aware Multimodal Alignment Based Approach for News Image Captioning

An Entity Alignment Method Based on Graph Attention Network with Pre-classification

Transform and Tell: Entity-Aware News Image Captioning

Rethinking Uncertainly Missing and Ambiguous Visual Modality in Multi-Modal Entity Alignment

Transform, Contrast and Tell: Coherent Entity-Aware Multi-Image Captioning

Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph

Caption-Aware Multimodal Relation Extraction with Mutual Information Maximization

Multi-Modal Entity Alignment Method Based on Feature Enhancement

Improving Fake News Detection by Using an Entity-enhanced Framework to Fuse Diverse Multimodal Clues

Entity Linking Supported Multimodal Data: Fusing Text and Image features for Higher Accuracy

MAF - A General Matching and Alignment Framework for Multimodal Named Entity Recognition.

Multi-modal Contrastive Representation Learning for Entity Alignment

Image-relevant Entities Knowledge aware News Image Captioning

Leveraging Intra-modal and Inter-modal Interaction for Multi-Modal Entity Alignment

SCMEA: A stacked co-enhanced model for entity alignment based on multi-aspect information fusion and bidirectional contrastive learning

HybridVocab: Towards Multi-Modal Machine Translation Via Multi-Aspect Alignment

MCSFF: Multi-modal Consistency and Specificity Fusion Framework for Entity Alignment

Attribute-Consistent Knowledge Graph Representation Learning for Multi-Modal Entity Alignment

Generalizable Entity Grounding via Assistance of Large Language Model

Person Entity Alignment Method Based on Multimodal Information Aggregation