Train Once, Deploy Anywhere: Matryoshka Representation Learning for Multimodal Recommendation

Yueqi Wang,Zhenrui Yue,Huimin Zeng,Dong Wang,Julian McAuley
2024-10-02
Abstract:Despite recent advancements in language and vision modeling, integrating rich multimodal knowledge into recommender systems continues to pose significant challenges. This is primarily due to the need for efficient recommendation, which requires adaptive and interactive responses. In this study, we focus on sequential recommendation and introduce a lightweight framework called full-scale Matryoshka representation learning for multimodal recommendation (fMRLRec). Our fMRLRec captures item features at different granularities, learning informative representations for efficient recommendation across multiple dimensions. To integrate item features from diverse modalities, fMRLRec employs a simple mapping to project multimodal item features into an aligned feature space. Additionally, we design an efficient linear transformation that embeds smaller features into larger ones, substantially reducing memory requirements for large-scale training on recommendation data. Combined with improved state space modeling techniques, fMRLRec scales to different dimensions and only requires one-time training to produce multiple models tailored to various granularities. We demonstrate the effectiveness and efficiency of fMRLRec on multiple benchmark datasets, which consistently achieves superior performance over state-of-the-art baseline methods. We make our code and data publicly available at <a class="link-external link-https" href="https://github.com/yueqirex/fMRLRec" rel="external noopener nofollow">this https URL</a>.
Information Retrieval
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to efficiently integrate rich multi - modal knowledge into the recommendation system to improve the performance and efficiency of the recommendation system. Specifically, the paper proposes a lightweight framework - full - scale Matryoshka Representation Learning for Recommendation (fMRLRec), aiming to generate multiple models through one - time training. These models can adapt to the requirements in different scenarios, such as different model sizes or dimension sizes, thereby reducing memory and computational costs while maintaining performance. ### Background and Motivation of the Paper Although language and visual models have made remarkable progress in recent years, effectively integrating these multi - modal knowledge into the recommendation system still faces challenges. The main reason is that the recommendation system requires efficient responses, which requires the model to be able to adapt and interact quickly. Although the existing multi - modal recommendation methods are effective, they often require different granularities (i.e., model or dimension sizes) to balance performance and efficiency in different recommendation scenarios (such as centralized or federated recommendation systems). For example, larger dimensions are usually used to encode language and visual features to achieve fine - grained understanding and generation tasks, while smaller feature sizes can significantly reduce the demand for computational resources with a slight decrease in performance. ### Solutions To solve the above problems, the paper proposes the fMRLRec framework, and its core features include: 1. **Multi - granularity Representation Learning**: fMRLRec can capture item features of different granularities, and the learned information representation can be used for efficient recommendation across multiple dimensions. 2. **Multi - modal Feature Alignment**: Project multi - modal item features into an aligned feature space through simple mapping, so that data of different modalities can be processed uniformly. 3. **Efficient Linear Transformation**: Design an efficient linear transformation to embed smaller features into larger features, significantly reducing the memory requirements during large - scale training. 4. **One - time Training, Multiple Deployments**: Combined with the improved state - space modeling technology, fMRLRec can generate multiple models optimized for different granularities after one - time training to meet the requirements in different scenarios. ### Experimental Results The paper verifies the effectiveness and efficiency of fMRLRec on multiple benchmark datasets. The results show that fMRLRec outperforms the existing state - of - the - art baseline methods in multiple metrics. Especially on sparse datasets (such as Clothing and Sports), fMRLRec shows significant advantages, with an average performance improvement of more than 21%. In addition, fMRLRec can maintain high performance under different model sizes, providing flexible choices and being suitable for developers with limited computational resources. ### Conclusions In general, the method of fMRLRec to generate multiple models through one - time training not only improves the performance of the recommendation system but also significantly reduces the computational costs of training and inference, providing a new solution for the practical application of multi - modal recommendation systems.