A Survey of Multi-modal Knowledge Graphs: Technologies and Trends

Wanying Liang,Pasquale De Meo,Yong Tang,Jia Zhu
DOI: https://doi.org/10.1145/3656579
IF: 16.6
2024-04-10
ACM Computing Surveys
Abstract:In recent years, Knowledge Graphs (KGs) have played a crucial role in the development of advanced knowledge-intensive applications, such as recommender systems and semantic search. However, the human sensory system is inherently multi-modal, as objects around us are often represented by a combination of multiple signals, such as visual and textual. Consequently, Multi-modal Knowledge Graphs (MMKGs), which combine structured knowledge representation with multiple modalities, represent a powerful extension of KGs. Although MMKGs can handle certain types of tasks (e.g., visual query answering) or queries that standard KGs cannot process, and they can effectively tackle some standard problems (e.g., entity alignment), we lack a widely accepted definition of MMKG. In this survey, we provide a rigorous definition of MMKGs along with a classification scheme based on how existing approaches address four fundamental challenges: representation, fusion, alignment, and translation, which are crucial to improving an MMKG. Our classification scheme is flexible and allows for easy incorporation of new approaches, as well as a comparison of two approaches in terms of how they address one of the fundamental challenges mentioned above. As the first comprehensive survey of MMKG, this article aims to inspire and provide a reference for relevant researchers in the field of Artificial Intelligence.
computer science, theory & methods
What problem does this paper attempt to address?
The paper primarily explores the concepts, technologies, and trends of Multimodal Knowledge Graphs (MMKGs) and attempts to address the following core issues: 1. **Definition and Understanding of MMKGs**: The paper provides a rigorous definition of Multimodal Knowledge Graphs (MMKGs) and distinguishes MMKGs from other related fields, such as multimodal machine learning. With a clear definition, researchers can better understand the application scenarios and limitations of MMKGs. 2. **Construction Methods**: The paper details the methods and techniques for constructing MMKGs, including methods based on structured data and unstructured data. This helps researchers choose appropriate construction strategies based on specific application scenarios. 3. **Core Challenges**: The paper identifies and discusses four core technical challenges faced by MMKGs—representation, fusion, alignment, and translation. These challenges are crucial for improving the performance of MMKGs. 4. **Role of Pre-training Techniques**: The paper emphasizes the value of pre-training techniques in MMKGs, especially in feature embedding, auxiliary tasks, and downstream tasks. 5. **Downstream Tasks and Evaluation Criteria**: The paper explores the downstream tasks of MMKGs in different application fields and the corresponding evaluation criteria, which helps assess the actual effectiveness and potential improvement directions of MMKGs. 6. **Existing Issues and Future Development Directions**: The paper points out the current issues faced by MMKGs and proposes suggestions and directions for future development, aiming to promote further development in this field. In summary, the main goal of this paper is to provide a comprehensive survey of the field of Multimodal Knowledge Graphs, including its definition, construction methods, core technical challenges, the impact of pre-training techniques, downstream tasks and applications, existing issues, and future prospects, to inspire interest among relevant researchers and provide them with a reference.