MMGRec: Multimodal Generative Recommendation with Transformer Model

Han Liu,Yinwei Wei,Xuemeng Song,Weili Guan,Yuan-Fang Li,Liqiang Nie

2024-04-25

Abstract:Multimodal recommendation aims to recommend user-preferred candidates based on her/his historically interacted items and associated multimodal information. Previous studies commonly employ an embed-and-retrieve paradigm: learning user and item representations in the same embedding space, then retrieving similar candidate items for a user via embedding inner product. However, this paradigm suffers from inference cost, interaction modeling, and false-negative issues. Toward this end, we propose a new MMGRec model to introduce a generative paradigm into multimodal recommendation. Specifically, we first devise a hierarchical quantization method Graph RQ-VAE to assign Rec-ID for each item from its multimodal and CF information. Consisting of a tuple of semantically meaningful tokens, Rec-ID serves as the unique identifier of each item. Afterward, we train a Transformer-based recommender to generate the Rec-IDs of user-preferred items based on historical interaction sequences. The generative paradigm is qualified since this model systematically predicts the tuple of tokens identifying the recommended item in an autoregressive manner. Moreover, a relation-aware self-attention mechanism is devised for the Transformer to handle non-sequential interaction sequences, which explores the element pairwise relation to replace absolute positional encoding. Extensive experiments evaluate MMGRec's effectiveness compared with state-of-the-art methods.

Information Retrieval

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to address some key issues in multimodal recommendation systems, particularly the problems associated with the commonly adopted "embedding-retrieval" paradigm in existing methods. Specifically: 1. **High Inference Cost**: As the number of users and items increases, the time complexity of methods based on inner product similarity calculations rises significantly, affecting recommendation efficiency. 2. **Insufficient Interaction Modeling**: Linear inner products cannot fully model the complex interaction structures between users and items. Although some studies attempt to use neural networks or metric learning to improve this, they sacrifice the speed of similarity calculations. 3. **False Negative Problem**: This paradigm assumes that interacted items are closer to the user's preferences than non-interacted items. However, in reality, non-interaction does not necessarily mean dislike. To address these issues, the paper proposes a new multimodal generative recommendation model, MMGRec, introducing a generative paradigm. MMGRec achieves this through the following means: - Designing a new item identifier, Rec-ID, which includes semantic information and popularity information. - Proposing a Graph Residual Quantization Variational Autoencoder (Graph RQ-VAE) to allocate Rec-ID. - Designing a relation-aware self-attention mechanism to generate Rec-ID, overcoming the lack of positional information in historical interaction sequences. Through these technical means, experimental results on three public datasets show that MMGRec outperforms existing state-of-the-art methods in terms of both performance and inference efficiency.

MMGRec: Multimodal Generative Recommendation with Transformer Model

A Dynamic Collaborative Recommendation Method Based on Multimodal Fusion

MGT: Multi-Granularity Transformer Leveraging Multi-Level Relation for Sequential Recommendation

GenRec: Generative Sequential Recommendation with Large Language Models

MMRec: Simplifying Multimodal Recommendation

MMCRec: Towards Multi-modal Generative AI in Conversational Recommendation

TransRec: Learning Transferable Recommendation from Mixture-of-Modality Feedback

A Novel Multi-modal Recommender with Modality-Specific Graph Refining Strategy

Multi-Modality is All You Need for Transferable Recommender Systems

Multi-Grained Preference Enhanced Transformer for Multi-Behavior Sequential Recommendation

Online Distillation-enhanced Multi-modal Transformer for Sequential Recommendation

Train Once, Deploy Anywhere: Matryoshka Representation Learning for Multimodal Recommendation

Multimodal Difference Learning for Sequential Recommendation

MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation

Multimodal Conditioned Diffusion Model for Recommendation

GMiRec: A Multi-image Visual Recommendation Model Based on a Gated Neural Network.

When Multi-Level Meets Multi-Interest: A Multi-Grained Neural Model for Sequential Recommendation

Multi-Behavior Sequential Transformer Recommender

BiVRec: Bidirectional View-based Multimodal Sequential Recommendation

MaTrRec: Uniting Mamba and Transformer for Sequential Recommendation

Multimodal Multi-Graph Joint Recommendation