M3: A Multi-Image Multi-Modal Entity Alignment Dataset

Shiqi Zhang,Weixin Zeng,Zhen Tan,Xiang Zhao,Weidong Xiao
DOI: https://doi.org/10.1145/3627673.3679126
2024-01-01
Abstract:Multi-modal Entity Alignment (MMEA) aims to identify equivalent entities across different multi-modal knowledge graphs (MMKGs), facilitating their integration and enhancing coverage. However, current MMEA datasets have limitations, including low entity coverage, a single image per entity, high inter-image correlation, and images sourced from the same search engine, which do not reflect real-world challenges. The fair comparison and development of alignment solutions may be hindered by these oversimplified scenarios. To address this problem, in this work, we first construct M3, an MMEA benchmark equipped with multiple images from different search engines in real-world scenarios. Additionally, we design a simple and universal multi-image processing module (AMIA), which assigns varying attention weights to images associated with entities to effectively model visual information. Experimental results validate the difficulty of M3, as well as the effectiveness of AMIA. Despite the superior performance of AMIA, there is still room for developing more advanced solutions to address these difficulties. Our dataset is publicly released.
What problem does this paper attempt to address?