AlignRec: Aligning and Training in Multimodal Recommendations

Yifan Liu,Kangning Zhang,Xiangyuan Ren,Yanhua Huang,Jiarui Jin,Yingjie Qin,Ruilong Su,Ruiwen Xu,Yong Yu,Weinan Zhang
2024-08-01
Abstract:With the development of multimedia systems, multimodal recommendations are playing an essential role, as they can leverage rich contexts beyond interactions. Existing methods mainly regard multimodal information as an auxiliary, using them to help learn ID features; However, there exist semantic gaps among multimodal content features and ID-based features, for which directly using multimodal information as an auxiliary would lead to misalignment in representations of users and items. In this paper, we first systematically investigate the misalignment issue in multimodal recommendations, and propose a solution named AlignRec. In AlignRec, the recommendation objective is decomposed into three alignments, namely alignment within contents, alignment between content and categorical ID, and alignment between users and items. Each alignment is characterized by a specific objective function and is integrated into our multimodal recommendation framework. To effectively train AlignRec, we propose starting from pre-training the first alignment to obtain unified multimodal features and subsequently training the following two alignments together with these features as input. As it is essential to analyze whether each multimodal feature helps in training and accelerate the iteration cycle of recommendation models, we design three new classes of metrics to evaluate intermediate performance. Our extensive experiments on three real-world datasets consistently verify the superiority of AlignRec compared to nine baselines. We also find that the multimodal features generated by AlignRec are better than currently used ones, which are to be open-sourced in our repository <a class="link-external link-https" href="https://github.com/sjtulyf123/AlignRec_CIKM24" rel="external noopener nofollow">this https URL</a>.
Information Retrieval,Machine Learning
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper primarily explores the alignment issue in multimodal recommendation systems and proposes a new method called AlignRec. Specifically: 1. **Alignment Issue**: Existing multimodal recommendation methods mainly use image and text information as auxiliary features to help learn ID features. However, there is a semantic gap between these modalities, and directly using multimodal information can lead to inconsistencies in user and item representations. 2. **Solution**: The paper systematically studies the alignment issue in multimodal recommendations and proposes a solution—AlignRec. AlignRec decomposes the recommendation objective into three alignment tasks: - Inter-Content Alignment (ICA): Unifying the representation of different modalities through a cross-modal encoder. - Content-Category Alignment (CCA): Using contrastive learning to narrow the gap between multimodal content features and user/item ID features. - User-Item Alignment (UIA): Aligning users with the items they have interacted with through cosine similarity. 3. **Training Strategy**: To effectively train AlignRec, the authors propose pre-training the inter-content alignment task first, and then using the multimodal features obtained from pre-training for joint training of the subsequent two alignment tasks. 4. **Evaluation Protocols**: The paper also designs three new intermediate evaluation protocols to directly assess the effectiveness of multimodal features, including zero-shot evaluation, item-based collaborative filtering, and masked modality recommendation, to select better multimodal encoders and reduce the complexity of hyperparameter search. Through the above methods, the paper aims to improve the performance of multimodal recommendation systems, especially in long-tail items or cold-start scenarios. Experimental results show that AlignRec outperforms nine baseline methods on three real-world datasets.