Multi-modal Graph and Sequence Fusion Learning for Recommendation.

Zejun Wang,Xinglong Wu,Hongwei Yang,Hui He,Yu Tai,Weizhe Zhang
DOI: https://doi.org/10.1007/978-981-99-8429-9_29
2024-01-01
Abstract:Multi-modal recommendation aims to leverage multi-modal information for mining users’ latent preferences. Existing multi-modal recommendation approaches primarily exploit graph structures and multi-modal information to explore the graph information derived from user-item interactions, overlooking the underlying sequence information. Furthermore, by treating items solely as coarse-grained entities, the latent relationships of items within each modality are disregarded, impeding the effective extraction of latent user preferences. To address the limitations, we propose a novel approach called Multi-modal Graph and Sequence Fusion Learning Architecture for Recommendation (MMGCF). In MMGCF, we first construct dynamic item-item graphs to enhance item features and capture relationships within each modality. Subsequently, according to the influence between modalities, we design a self attention network to fuse multi-modal features. Finally, in addition to regular graph convolution, we also devise a sequence-aware learning layer to preserve and capture sequence information for model to learn user preferences from a sequential perspective. Extensive experiments conducted on three real-world datasets demonstrate the superiority of our method over various state-of-the-art baselines.
What problem does this paper attempt to address?