Multimodal Conditioned Diffusion Model for Recommendation

Haokai Ma,Yimeng Yang,Ruobing Xie,Xiangxu Meng,Lei Meng
DOI: https://doi.org/10.1145/3589335.3651956
2024-05-13
Abstract:Multimodal recommendation aims at to modeling the feature distributions of items by using their multi-modal information. Prior efforts typically focus on the denoising of the user-item graph with a degree-sensitive strategy, which may not well-handle the users' consistent preference across modalities. More importantly, it has been observed that existing methods may learn ill-posed item embeddings due to their focus on a specific auxiliary optimization task for multimodal representations rather than explicitly modeling them. This paper therefore presents a solution that takes the advantages of the explicit uncertainty injection ability of Diffusion Model (DM) for the modeling and fusion of multi-modal information. Specifically, we propose a novel Multimodal Conditioned Diffusion Model for Recommendation (MCDRec), which tailors DM with two technical modules to model the high-order multimodal knowledge. The first module is multimodal-conditioned representation diffusion (MRD), which integrates pre-extracted multimodal knowledge into the item representation modeling via a tailored DM. This smoothly bridges the insurmountable gap between the multi-modal content features and the collaborative signals. Secondly, with the diffusion-guided graph denoising (DGD) module, MCDRec may effectively denoise the user-item graph by filtering the occasional interactions in user historical behaviors. This is achieved with the power of DM in aligning the users' collaborative preferences with their shared items' content information. Extensive experiments compared to several SOTA baselines on two real-word datasets demonstrate the effectiveness of MCDRec. The specific visualization also reveals the potential of MRD to precisely handling the high-order representation correlations among the user embeddings and the multi-modal heterogeneous representations of items.
Computer Science
What problem does this paper attempt to address?