Abstract:Graph neural networks (GNNs) have shown great potential for personalized recommendation. At the core is to reorganize interaction data as a user-item bipartite graph and exploit high-order connectivity among user and item nodes to enrich their representations. While achieving great success, most existing works consider interaction graph based only on ID information, foregoing item contents from multiple modalities (e.g., visual, acoustic, and textual features of micro-video items). Distinguishing personal interests on different modalities at a granular level was not explored until recently proposed MMGCN (Wei et al., 2019). However, it simply employs GNNs on parallel interaction graphs and treats information propagated from all neighbors equally, failing to capture user preference adaptively. Hence, the obtained representations might preserve redundant, even noisy information, leading to non-robustness and suboptimal performance. In this work, we aim to investigate how to adopt GNNs on multimodal interaction graphs, to adaptively capture user preference on different modalities and offer in-depth analysis on why an item is suitable to a user. Towards this end, we propose a new Multimodal Graph Attention Network, short for MGAT, which disentangles personal interests at the granularity of modality. In particular, built upon multimodal interaction graphs, MGAT conducts information propagation within individual graphs, while leveraging the gated attention mechanism to identify varying importance scores of different modalities to user preference. As such, it is able to capture more complex interaction patterns hidden in user behaviors and provide a more accurate recommendation. Empirical results on two micro-video recommendation datasets, Tiktok and MovieLens, show that MGAT exhibits substantial improvements over the state-of-the-art baselines like NGCF (Wang, He, et al., 2019) and MMGCN (Wei et al., 2019). Further analysis on a case study illustrates how MGAT generates attentive information flow over multimodal interaction graphs.

Pre-training Graph Transformer with Multimodal Side Information for Recommendation

A Pre-training Strategy for Recommendation.

MMGRec: Multimodal Generative Recommendation with Transformer Model

Multi-modal Recommendation Based on Knowledge Graph

A Unified Graph Transformer for Overcoming Isolations in Multi-modal Recommendation

PDT: Pretrained Dual Transformers for Time-aware Bipartite Graphs

Structure Guided Multi-modal Pre-trained Transformer for Knowledge Graph Reasoning

MGAT: Multimodal Graph Attention Network for Recommendation.

MMGCN: Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video

GENET: Unleashing the Power of Side Information for Recommendation via Hypergraph Pre-training

Multi-Behavior Enhanced Heterogeneous Graph Convolutional Networks Recommendation Algorithm based on Feature-Interaction

MGT: Multi-Granularity Transformer Leveraging Multi-Level Relation for Sequential Recommendation

Multi-Behavior Hypergraph-Enhanced Transformer for Sequential Recommendation

GUME: Graphs and User Modalities Enhancement for Long-Tail Multimodal Recommendation

Online Distillation-enhanced Multi-modal Transformer for Sequential Recommendation

Multimodal Pre-training Framework for Sequential Recommendation via Contrastive Learning

Multimodal collaborative graph for image recommendation

MPL-TransKR: Multi-Perspective Learning based on Transformer Knowledge Graph Enhanced Recommendation

Preference-corrected multimodal graph convolutional recommendation network

LightGT: A Light Graph Transformer for Multimedia Recommendation

Multimodal Pretraining, Adaptation, and Generation for Recommendation: A Survey