Abstract:Multimodal data pervades various domains, including healthcare, social media, and transportation, where multimodal graphs play a pivotal role. Machine learning on multimodal graphs, referred to as multimodal graph learning (MGL), is essential for successful artificial intelligence (AI) applications. The burgeoning research in this field encompasses diverse graph data types and modalities, learning techniques, and application scenarios. This survey paper conducts a comparative analysis of existing works in multimodal graph learning, elucidating how multimodal learning is achieved across different graph types and exploring the characteristics of prevalent learning techniques. Additionally, we delineate significant applications of multimodal graph learning and offer insights into future directions in this domain. Consequently, this paper serves as a foundational resource for researchers seeking to comprehend existing MGL techniques and their applicability across diverse scenarios.

What problem does this paper attempt to address?

This paper is a survey on Multimodal Graph Learning (MGL). Multimodal data is widely present in fields such as healthcare, social media, and transportation, with multimodal graphs playing a crucial role. Multimodal graph learning aims to explore the cross-modal and intrinsic correlations of multimodal data by processing different types of graph data and modalities using graph representations. The paper mentions the challenges faced by multimodal graph learning, including how to effectively process and integrate knowledge from different modalities, especially the data fusion under complex graph topologies. The paper first introduces the importance of multimodal learning, particularly its enhanced effects in AI applications. It then defines multimodal graphs and categorizes them into three types based on the distribution of data modalities at the feature, node, and graph levels. Subsequently, the authors classify existing multimodal graph learning methods, including Multimodal Graph Convolutional Networks (MGCN), Multimodal Graph Attention Networks (MGAT), and Multimodal Graph Contrastive Learning (MGCL), and list some representative research works and their application scenarios. The paper also discusses the characteristics and limitations of these methods. For example, MGCN can effectively extract cross-modal relationships when processing node-level graphs but may be inadequate in aggregating long-range information; MGAT captures long-range information through attention mechanisms but may exhibit bias towards certain modalities when dealing with node-level graphs; and MGCL extracts similarities and differences between modalities through contrastive learning strategies but still faces challenges when extended to graph-level data with multiple modalities. Lastly, the paper outlines the important applications of multimodal graph learning, such as multimodal knowledge graphs, and mentions relevant libraries and tasks, as well as future research directions. In summary, this paper aims to provide researchers with a comprehensive overview of multimodal graph learning, enabling them to understand existing techniques and guide their applications in different scenarios.

Learning on Multimodal Graphs: A Survey

A Survey on Multimodal Knowledge Graphs: Construction, Completion and Applications

Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey

Multi-Modal Knowledge Graph Construction and Application: A Survey

Cross-Modal Knowledge Discovery, Inference, and Challenges.

A Survey of Multi-modal Knowledge Graphs: Technologies and Trends

Multimodal Graph Benchmark

Vision+X: A Survey on Multimodal Learning in the Light of Data

Multimodal Machine Learning: A Survey and Taxonomy

Multimodal Methods for Analyzing Learning and Training Environments: A Systematic Literature Review

Deep Multimodal Learning with Missing Modality: A Survey

A survey of multimodal federated learning: background, applications, and perspectives

Self-Supervised Multimodal Learning: A Survey

Curriculum Graph Machine Learning: A Survey

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

Multimodal Graph for Unaligned Multimodal Sequence Analysis via Graph Convolution and Graph Pooling

A Survey of Multimodal Large Language Model from A Data-centric Perspective

A Survey of Multimodal Composite Editing and Retrieval

Multimodality in meta-learning: A comprehensive survey

Multimodal Image Synthesis and Editing: A Survey and Taxonomy