Learning on Multimodal Graphs: A Survey

Ciyuan Peng,Jiayuan He,Feng Xia
2024-02-08
Abstract:Multimodal data pervades various domains, including healthcare, social media, and transportation, where multimodal graphs play a pivotal role. Machine learning on multimodal graphs, referred to as multimodal graph learning (MGL), is essential for successful artificial intelligence (AI) applications. The burgeoning research in this field encompasses diverse graph data types and modalities, learning techniques, and application scenarios. This survey paper conducts a comparative analysis of existing works in multimodal graph learning, elucidating how multimodal learning is achieved across different graph types and exploring the characteristics of prevalent learning techniques. Additionally, we delineate significant applications of multimodal graph learning and offer insights into future directions in this domain. Consequently, this paper serves as a foundational resource for researchers seeking to comprehend existing MGL techniques and their applicability across diverse scenarios.
Artificial Intelligence,Machine Learning,Graphics,Social and Information Networks
What problem does this paper attempt to address?
This paper is a survey on Multimodal Graph Learning (MGL). Multimodal data is widely present in fields such as healthcare, social media, and transportation, with multimodal graphs playing a crucial role. Multimodal graph learning aims to explore the cross-modal and intrinsic correlations of multimodal data by processing different types of graph data and modalities using graph representations. The paper mentions the challenges faced by multimodal graph learning, including how to effectively process and integrate knowledge from different modalities, especially the data fusion under complex graph topologies. The paper first introduces the importance of multimodal learning, particularly its enhanced effects in AI applications. It then defines multimodal graphs and categorizes them into three types based on the distribution of data modalities at the feature, node, and graph levels. Subsequently, the authors classify existing multimodal graph learning methods, including Multimodal Graph Convolutional Networks (MGCN), Multimodal Graph Attention Networks (MGAT), and Multimodal Graph Contrastive Learning (MGCL), and list some representative research works and their application scenarios. The paper also discusses the characteristics and limitations of these methods. For example, MGCN can effectively extract cross-modal relationships when processing node-level graphs but may be inadequate in aggregating long-range information; MGAT captures long-range information through attention mechanisms but may exhibit bias towards certain modalities when dealing with node-level graphs; and MGCL extracts similarities and differences between modalities through contrastive learning strategies but still faces challenges when extended to graph-level data with multiple modalities. Lastly, the paper outlines the important applications of multimodal graph learning, such as multimodal knowledge graphs, and mentions relevant libraries and tasks, as well as future research directions. In summary, this paper aims to provide researchers with a comprehensive overview of multimodal graph learning, enabling them to understand existing techniques and guide their applications in different scenarios.