Multimodality Invariant Learning for Multimedia-Based New Item Recommendation

Haoyue Bai,Le Wu,Min Hou,Miaomiao Cai,Zhuangzhuang He,Yuyang Zhou,Richang Hong,Meng Wang
DOI: https://doi.org/10.1145/3626772.3658596
2024-01-01
Abstract:Multimedia-based recommendation provides personalized item suggestions bylearning the content preferences of users. With the proliferation of digitaldevices and APPs, a huge number of new items are created rapidly over time. Howto quickly provide recommendations for new items at the inference time ischallenging. What's worse, real-world items exhibit varying degrees of modalitymissing(e.g., many short videos are uploaded without text descriptions). Thoughmany efforts have been devoted to multimedia-based recommendations, they eithercould not deal with new multimedia items or assumed the modality completenessin the modeling process. In this paper, we highlight the necessity of tackling the modality missingissue for new item recommendation. We argue that users' inherent contentpreference is stable and better kept invariant to arbitrary modality missingenvironments. Therefore, we approach this problem from a novel perspective ofinvariant learning. However, how to construct environments from finite userbehavior training data to generalize any modality missing is challenging. Totackle this issue, we propose a novel Multimodality Invariant LearningreCommendation(a.k.a. MILK) framework. Specifically, MILK first designs across-modality alignment module to keep semantic consistency from pretrainedmultimedia item features. After that, MILK designs multi-modal heterogeneousenvironments with cyclic mixup to augment training data, in order to mimic anymodality missing for invariant user preference learning. Extensive experimentson three real datasets verify the superiority of our proposed framework. Thecode is available at https://github.com/HaoyueBai98/MILK.
What problem does this paper attempt to address?