A Multimodal Single-Branch Embedding Network for Recommendation in Cold-Start and Missing Modality Scenarios

Christian Ganhör,Marta Moscati,Anna Hausberger,Shah Nawaz,Markus Schedl
DOI: https://doi.org/10.1145/3640457.3688009
2024-09-26
Abstract:Most recommender systems adopt collaborative filtering (CF) and provide recommendations based on past collective interactions. Therefore, the performance of CF algorithms degrades when few or no interactions are available, a scenario referred to as cold-start. To address this issue, previous work relies on models leveraging both collaborative data and side information on the users or items. Similar to multimodal learning, these models aim at combining collaborative and content representations in a shared embedding space. In this work we propose a novel technique for multimodal recommendation, relying on a multimodal Single-Branch embedding network for Recommendation (SiBraR). Leveraging weight-sharing, SiBraR encodes interaction data as well as multimodal side information using the same single-branch embedding network on different modalities. This makes SiBraR effective in scenarios of missing modality, including cold start. Our extensive experiments on large-scale recommendation datasets from three different recommendation domains (music, movie, and e-commerce) and providing multimodal content information (audio, text, image, labels, and interactions) show that SiBraR significantly outperforms CF as well as state-of-the-art content-based RSs in cold-start scenarios, and is competitive in warm scenarios. We show that SiBraR's recommendations are accurate in missing modality scenarios, and that the model is able to map different modalities to the same region of the shared embedding space, hence reducing the modality gap.
Information Retrieval,Artificial Intelligence,Machine Learning,Multimedia
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the cold-start and missing modality issues in recommendation systems. Specifically: 1. **Cold-Start Problem**: Most recommendation systems use Collaborative Filtering (CF) techniques, which recommend items based on users' past collective interaction data. However, the performance of CF algorithms significantly drops when there is no historical interaction data for users or items, a situation known as cold-start. The cold-start problem is particularly prominent when new users or new items join the system. 2. **Missing Modality Problem**: In multimodal recommendation systems, data for certain modalities may be missing, such as items lacking images or text descriptions. In such cases, effectively utilizing other available modalities for recommendations is a challenge. To address these issues, the authors propose a new Multimodal Single-Branch Embedding Network for Recommendation (SiBraR). SiBraR encodes data from different modalities (such as interaction data, audio, text, images, etc.) into the same embedding space by sharing weights, thereby providing more accurate recommendations in cold-start and missing modality scenarios. ### Main Contributions 1. **Proposing SiBraR**: A new content-based recommendation system leveraging multimodal information, capable of providing effective recommendations in standard, cold-start, and missing modality scenarios. 2. **Extensive Experimental Evaluation**: Conducted extensive quantitative experiments on large recommendation datasets from three different domains (music, movies, e-commerce) to evaluate the recommendation accuracy of SiBraR, comparing it with traditional and state-of-the-art methods. 3. **Analysis of Missing Modality Impact**: Investigated the impact of missing modalities on the performance of SiBraR. 4. **Shared Embedding Space Analysis**: Demonstrated that SiBraR can map different modalities to the same region in the shared embedding space, thereby reducing modality gaps. Through these contributions, SiBraR not only excels in cold-start scenarios but also remains competitive in warm-start scenarios.