MMGRec: Multimodal Generative Recommendation with Transformer Model

Han Liu,Yinwei Wei,Xuemeng Song,Weili Guan,Yuan-Fang Li,Liqiang Nie
2024-04-25
Abstract:Multimodal recommendation aims to recommend user-preferred candidates based on her/his historically interacted items and associated multimodal information. Previous studies commonly employ an embed-and-retrieve paradigm: learning user and item representations in the same embedding space, then retrieving similar candidate items for a user via embedding inner product. However, this paradigm suffers from inference cost, interaction modeling, and false-negative issues. Toward this end, we propose a new MMGRec model to introduce a generative paradigm into multimodal recommendation. Specifically, we first devise a hierarchical quantization method Graph RQ-VAE to assign Rec-ID for each item from its multimodal and CF information. Consisting of a tuple of semantically meaningful tokens, Rec-ID serves as the unique identifier of each item. Afterward, we train a Transformer-based recommender to generate the Rec-IDs of user-preferred items based on historical interaction sequences. The generative paradigm is qualified since this model systematically predicts the tuple of tokens identifying the recommended item in an autoregressive manner. Moreover, a relation-aware self-attention mechanism is devised for the Transformer to handle non-sequential interaction sequences, which explores the element pairwise relation to replace absolute positional encoding. Extensive experiments evaluate MMGRec's effectiveness compared with state-of-the-art methods.
Information Retrieval
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to address some key issues in multimodal recommendation systems, particularly the problems associated with the commonly adopted "embedding-retrieval" paradigm in existing methods. Specifically: 1. **High Inference Cost**: As the number of users and items increases, the time complexity of methods based on inner product similarity calculations rises significantly, affecting recommendation efficiency. 2. **Insufficient Interaction Modeling**: Linear inner products cannot fully model the complex interaction structures between users and items. Although some studies attempt to use neural networks or metric learning to improve this, they sacrifice the speed of similarity calculations. 3. **False Negative Problem**: This paradigm assumes that interacted items are closer to the user's preferences than non-interacted items. However, in reality, non-interaction does not necessarily mean dislike. To address these issues, the paper proposes a new multimodal generative recommendation model, MMGRec, introducing a generative paradigm. MMGRec achieves this through the following means: - Designing a new item identifier, Rec-ID, which includes semantic information and popularity information. - Proposing a Graph Residual Quantization Variational Autoencoder (Graph RQ-VAE) to allocate Rec-ID. - Designing a relation-aware self-attention mechanism to generate Rec-ID, overcoming the lack of positional information in historical interaction sequences. Through these technical means, experimental results on three public datasets show that MMGRec outperforms existing state-of-the-art methods in terms of both performance and inference efficiency.