Abstract:With the rapid growth of multimedia-sharing platforms (e.g. Twitter and TikTok), multimedia recommender systems have become fundamental for helping users alleviate information overload and discover items of interest. Existing multimedia recommendation methods often incorporate various auxiliary modalities (e.g., visual, textual, and acoustic) to describe item characteristics and improve task performance. However, these methods usually assume that each item is associated with complete modalities, ignoring the prevalence of missing modality issues in real-world scenarios. To deal with the challenge of missing modalities, in this paper, we propose a novel framework of Contrastive Intra- and Inter-Modality Generation (CI2MG) for enhancing incomplete multimedia recommendation. We first develop a contrastive intra- and inter-modality generation module for the missing modalities, where the intra-modality representation is updated through clustering-based hypergraph convolution and inter-modality representation is obtained by optimal transport between different modalities. To tackle the challenge of insufficient and incomplete supervision labels during intra- and inter-modality generation, a modality-aware contrastive learning paradigm is introduced based on an augmentation between the intra-modality view and inter-modality view. Furthermore, to learn task-related representations from the generative modalities and further improve the performance of recommendation, we design an enhanced multimedia recommendation module to alleviate the influences driven by task-irrelevant noise. Extensive experiments on real-world datasets show the superiority of our proposed CI2MG framework in offering great potential for personalized multimedia recommendation over the state-of-the-art baselines regarding Recall, NDCG, and Precision metrics.

The 2nd International Workshop on Deep Multi-modal Generation and Retrieval

Learning Disentangled Representation for Cross-Modal Retrieval with Deep Mutual Information Estimation.

Contrastive Intra- and Inter-Modality Generation for Enhancing Incomplete Multimedia Recommendation

Retrieving Multimodal Information for Augmented Generation: A Survey

Deep Vision Multimodal Learning: Methodology, Benchmark, and Trend

Gen-IR @ SIGIR 2023: The First Workshop on Generative Information Retrieval

LLMs Meet Multimodal Generation and Editing: A Survey

McGE '24: the 2nd International Workshop on Multimedia Content Generation and Evaluation: New Methods & Practice

Multimodal Image Synthesis and Editing: The Generative AI Era

Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond

Cross-Modal Knowledge Discovery, Inference, and Challenges.

A Survey of Multimodal Composite Editing and Retrieval

Advanced Embedding Techniques in Multimodal Retrieval Augmented Generation A Comprehensive Study on Cross Modal AI Applications

Effective Deep Learning-Based Multi-Modal Retrieval

Multi-modal Deep Analysis for Multimedia

McGE '23: 1st International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice

ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling

From Matching to Generation: A Survey on Generative Information Retrieval

Future of Information Retrieval Research in the Age of Generative AI

Unified Text-to-Image Generation and Retrieval