MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation

Jinpeng Wang,Ziyun Zeng,Yunxiao Wang,Yuting Wang,Xingyu Lu,Tianxiang Li,Jun Yuan,Rui Zhang,Hai-Tao Zheng,Shu-Tao Xia
DOI: https://doi.org/10.1145/3581783.3611967
2023-10-23
Abstract:The goal of sequential recommendation (SR) is to predict a user's potential interested items based on her/his historical interaction sequences. Most existing sequential recommenders are developed based on ID features, which, despite their widespread use, often underperform with sparse IDs and struggle with the cold-start problem. Besides, inconsistent ID mappings hinder the model's transferability, isolating similar recommendation domains that could have been co-optimized. This paper aims to address these issues by exploring the potential of multi-modal information in learning robust and generalizable sequence representations. We propose MISSRec, a multi-modal pre-training and transfer learning framework for SR. On the user side, we design a Transformer-based encoder-decoder model, where the contextual encoder learns to capture the sequence-level multi-modal user interests while a novel interest-aware decoder is developed to grasp item-modality-interest relations for better sequence representation. On the candidate item side, we adopt a dynamic fusion module to produce user-adaptive item representation, providing more precise matching between users and items. We pre-train the model with contrastive learning objectives and fine-tune it in an efficient manner. Extensive experiments demonstrate the effectiveness and flexibility of MISSRec, promising a practical solution for real-world recommendation scenarios. Data and code are available on \url{<a class="link-external link-https" href="https://github.com/gimpong/MM23-MISSRec" rel="external noopener nofollow">this https URL</a>}.
Information Retrieval,Artificial Intelligence,Multimedia
What problem does this paper attempt to address?