Learning ID-free Item Representation with Token Crossing for Multimodal Recommendation

Kangning Zhang,Jiarui Jin,Yingjie Qin,Ruilong Su,Jianghao Lin,Yong Yu,Weinan Zhang
2024-10-25
Abstract:Current multimodal recommendation models have extensively explored the effective utilization of multimodal information; however, their reliance on ID embeddings remains a performance bottleneck. Even with the assistance of multimodal information, optimizing ID embeddings remains challenging for ID-based Multimodal Recommender when interaction data is sparse. Furthermore, the unique nature of item-specific ID embeddings hinders the information exchange among related items and the spatial requirement of ID embeddings increases with the scale of item. Based on these limitations, we propose an ID-free MultimOdal TOken Representation scheme named MOTOR that represents each item using learnable multimodal tokens and connects them through shared tokens. Specifically, we first employ product quantization to discretize each item's multimodal features (e.g., images, text) into discrete token IDs. We then interpret the token embeddings corresponding to these token IDs as implicit item features, introducing a new Token Cross Network to capture the implicit interaction patterns among these tokens. The resulting representations can replace the original ID embeddings and transform the original ID-based multimodal recommender into ID-free system, without introducing any additional loss design. MOTOR reduces the overall space requirements of these models, facilitating information interaction among related items, while also significantly enhancing the model's recommendation capability. Extensive experiments on nine mainstream models demonstrate the significant performance improvement achieved by MOTOR, highlighting its effectiveness in enhancing multimodal recommendation systems.
Information Retrieval
What problem does this paper attempt to address?
This paper attempts to solve several key problems in current multimodal recommendation systems: 1. **Information silos**: The independent ID embeddings of each item hinder the information exchange between related items. 2. **Cold - start problem**: For new items with very little interaction data, their ID embeddings are difficult to optimize. 3. **Storage burden**: As the number of items increases, the storage requirements for ID embeddings also increase accordingly. To solve these problems, the authors propose an ID - free multimodal token representation scheme (MOTOR). MOTOR is implemented through the following steps: - **Feature discretization**: First, use the optimized Product Quantization (OPQ) technique to discretize the multimodal features (such as images, text) of each item into discrete token IDs. - **Token embedding**: Then interpret the token embeddings corresponding to these token IDs as implicit item features, and introduce a new Token Cross Network to capture the implicit interaction patterns between these tokens. - **Replace ID embeddings**: The finally generated representation can replace the original ID embeddings, converting the ID - based multimodal recommendation system into an ID - free system without introducing any additional loss design. The main contributions of MOTOR include: - **Innovative ID - free multimodal token representation**: This is the first time that quantization techniques have been applied to multimodal recommendation systems to learn item representations through learnable multimodal token crossovers. - **Light - weight Token Cross Network**: A lightweight network is designed to explore the interactions between tokens, and the performance of the Token Cross Network for specific modalities and cross - modalities is experimentally evaluated. - **Significant performance improvement**: Extensive experiments on nine mainstream models show that MOTOR can significantly improve the performance of the recommendation system on both long - tail and popular items, and is compatible with multiple multimodal recommendation models without introducing additional loss design. Through these methods, MOTOR not only reduces the overall space requirements of the model, but also promotes the information exchange between related items, significantly enhancing the model's recommendation ability.