Abstract:User interaction data in recommender systems is a form of dyadic relation that reflects the preferences of users with items. Learning the representations of these two discrete sets of objects, users and items, is critical for recommendation. Recent multimodal recommendation models leveraging multimodal features (e.g., images and text descriptions) have been demonstrated to be effective in improving recommendation accuracy. However, state-of-the-art models enhance the dyadic relations between users and items by considering either user-user or item-item relations, leaving the high-order relations of the other side (i.e., users or items) unexplored. Furthermore, we experimentally reveal that the current multimodality fusion methods in the state-of-the-art models may degrade their recommendation performance. That is, without tainting the model architectures, these models can achieve even better recommendation accuracy with uni-modal information. On top of the finding, we propose a model that enhances the dyadic relations by learning Dual RepresentAtions of both users and items via constructing homogeneous Graphs for multimOdal recommeNdation. We name our model as DRAGON. Specifically, DRAGON constructs the user-user graph based on the commonly interacted items and the item-item graph from item multimodal features. It then utilizes graph learning on both the user-item heterogeneous graph and the homogeneous graphs (user-user and item-item) to obtain the dual representations of users and items. To capture information from each modality, DRAGON employs a simple yet effective fusion method, attentive concatenation, to derive the representations of users and items. Extensive experiments on three public datasets and seven baselines show that DRAGON can outperform the strongest baseline by 22.03% on average. Various ablation studies are conducted on DRAGON to validate its effectiveness.

Multimodal Multi-Graph Joint Recommendation

Multi-modal Recommendation Based on Knowledge Graph

Multimodal collaborative graph for image recommendation

Multi-modal Graph and Sequence Fusion Learning for Recommendation.

Graph Neural Networks with Deep Mutual Learning for Designing Multi-modal Recommendation Systems

A multimedia recommendation model based on collaborative graph

GUME: Graphs and User Modalities Enhancement for Long-Tail Multimodal Recommendation

Enhancing Dyadic Relations with Homogeneous Graphs for Multimodal Recommendation

Enhancing Recommender System with Multi-modal Knowledge Graph.

Dual-view multi-modal contrastive learning for graph-based recommender systems

Multi-Behavior Enhanced Heterogeneous Graph Convolutional Networks Recommendation Algorithm based on Feature-Interaction

Attention-guided Multi-step Fusion: A Hierarchical Fusion Network for Multimodal Recommendation

MM-GEF: Multi-modal representation meet collaborative filtering

Graph Heterogeneous Multi-Relational Recommendation

MM-FRec: Multi-Modal Enhanced Fashion Item Recommendation

DiffMM: Multi-Modal Diffusion Model for Recommendation

Preference-corrected multimodal graph convolutional recommendation network

Multi-Behavior Enhanced Recommendation with Cross-Interaction Collaborative Relation Modeling

LGMRec: Local and Global Graph Learning for Multimodal Recommendation

Multimodal Difference Learning for Sequential Recommendation

MMGCN: Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video