Abstract:In recent years, Knowledge Graphs (KGs) have played a crucial role in the development of advanced knowledge-intensive applications, such as recommender systems and semantic search. However, the human sensory system is inherently multi-modal, as objects around us are often represented by a combination of multiple signals, such as visual and textual. Consequently, Multi-modal Knowledge Graphs (MMKGs), which combine structured knowledge representation with multiple modalities, represent a powerful extension of KGs. Although MMKGs can handle certain types of tasks (e.g., visual query answering) or queries that standard KGs cannot process, and they can effectively tackle some standard problems (e.g., entity alignment), we lack a widely accepted definition of MMKG. In this survey, we provide a rigorous definition of MMKGs along with a classification scheme based on how existing approaches address four fundamental challenges: representation, fusion, alignment, and translation, which are crucial to improving an MMKG. Our classification scheme is flexible and allows for easy incorporation of new approaches, as well as a comparison of two approaches in terms of how they address one of the fundamental challenges mentioned above. As the first comprehensive survey of MMKG, this article aims to inspire and provide a reference for relevant researchers in the field of Artificial Intelligence.

What problem does this paper attempt to address?

The paper primarily explores the concepts, technologies, and trends of Multimodal Knowledge Graphs (MMKGs) and attempts to address the following core issues: 1. **Definition and Understanding of MMKGs**: The paper provides a rigorous definition of Multimodal Knowledge Graphs (MMKGs) and distinguishes MMKGs from other related fields, such as multimodal machine learning. With a clear definition, researchers can better understand the application scenarios and limitations of MMKGs. 2. **Construction Methods**: The paper details the methods and techniques for constructing MMKGs, including methods based on structured data and unstructured data. This helps researchers choose appropriate construction strategies based on specific application scenarios. 3. **Core Challenges**: The paper identifies and discusses four core technical challenges faced by MMKGs—representation, fusion, alignment, and translation. These challenges are crucial for improving the performance of MMKGs. 4. **Role of Pre-training Techniques**: The paper emphasizes the value of pre-training techniques in MMKGs, especially in feature embedding, auxiliary tasks, and downstream tasks. 5. **Downstream Tasks and Evaluation Criteria**: The paper explores the downstream tasks of MMKGs in different application fields and the corresponding evaluation criteria, which helps assess the actual effectiveness and potential improvement directions of MMKGs. 6. **Existing Issues and Future Development Directions**: The paper points out the current issues faced by MMKGs and proposes suggestions and directions for future development, aiming to promote further development in this field. In summary, the main goal of this paper is to provide a comprehensive survey of the field of Multimodal Knowledge Graphs, including its definition, construction methods, core technical challenges, the impact of pre-training techniques, downstream tasks and applications, existing issues, and future prospects, to inspire interest among relevant researchers and provide them with a reference.

A Survey of Multi-modal Knowledge Graphs: Technologies and Trends

Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey

Multi-Modal Knowledge Graph Construction and Application: A Survey

A Survey on Multimodal Knowledge Graphs: Construction, Completion and Applications

MMKG: Multi-modal Knowledge Graphs

Cross-Modal Knowledge Discovery, Inference, and Challenges.

A survey on knowledge-enhanced multimodal learning

Multi-modal Recommendation Based on Knowledge Graph

Evolving to multi-modal knowledge graphs for engineering design: state-of-the-art and future challenges

A Survey on Knowledge Graphs: Representation, Acquisition, and Applications

A Survey on Knowledge Graphs: Representation, Acquisition and Applications

Continual Multimodal Knowledge Graph Construction

Multi-modal knowledge graphs representation learning via multi-headed self-attention

Zero-shot and Few-shot Learning with Knowledge Graphs: A Comprehensive Survey

The Power of Noise: Toward a Unified Multi-modal Knowledge Graph Representation Framework.

A Survey on Knowledge Graph Embedding: Approaches, Applications and Benchmarks

A Decade of Knowledge Graphs in Natural Language Processing: A Survey

A Comprehensive Survey of Graph Neural Networks for Knowledge Graphs

MMEKG: Multi-modal Event Knowledge Graph Towards Universal Representation Across Modalities

Knowledge Graphs in Practice: Characterizing their Users, Challenges, and Visualization Opportunities