Abstract:With the rapid growth of multimedia-sharing platforms (e.g. Twitter and TikTok), multimedia recommender systems have become fundamental for helping users alleviate information overload and discover items of interest. Existing multimedia recommendation methods often incorporate various auxiliary modalities (e.g., visual, textual, and acoustic) to describe item characteristics and improve task performance. However, these methods usually assume that each item is associated with complete modalities, ignoring the prevalence of missing modality issues in real-world scenarios. To deal with the challenge of missing modalities, in this paper, we propose a novel framework of Contrastive Intra- and Inter-Modality Generation (CI2MG) for enhancing incomplete multimedia recommendation. We first develop a contrastive intra- and inter-modality generation module for the missing modalities, where the intra-modality representation is updated through clustering-based hypergraph convolution and inter-modality representation is obtained by optimal transport between different modalities. To tackle the challenge of insufficient and incomplete supervision labels during intra- and inter-modality generation, a modality-aware contrastive learning paradigm is introduced based on an augmentation between the intra-modality view and inter-modality view. Furthermore, to learn task-related representations from the generative modalities and further improve the performance of recommendation, we design an enhanced multimedia recommendation module to alleviate the influences driven by task-irrelevant noise. Extensive experiments on real-world datasets show the superiority of our proposed CI2MG framework in offering great potential for personalized multimedia recommendation over the state-of-the-art baselines regarding Recall, NDCG, and Precision metrics.

Understanding Modality Preferences in Search Clarification

Asking Multimodal Clarifying Questions in Mixed-Initiative Conversational Search

MIMICS: A Large-Scale Data Collection for Search Clarification

Contrastive Intra- and Inter-Modality Generation for Enhancing Incomplete Multimedia Recommendation

Clarifying the Path to User Satisfaction: An Investigation into Clarification Usefulness

Generating Clarifying Questions for Information Retrieval

Users Meet Clarifying Questions: Toward a Better Understanding of User Interactions for Search Clarification

Multimodal analysis of user behavior and browsed content under different image search intents

Corpus-informed Retrieval Augmented Generation of Clarifying Questions

Image Search by modality analysis: A study of color semantics

AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions for Conversational Search with LLMs

ManyModalQA: Modality Disambiguation and QA over Diverse Inputs

Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense

Multi-modal Learnable Queries for Image Aesthetics Assessment

Online and Offline Evaluation in Search Clarification

Multi-modal query expansion for web video search

Exploring Multi-Modal Representations for Ambiguity Detection & Coreference Resolution in the SIMMC 2.0 Challenge

Modality-Agnostic Attention Fusion for visual search with text feedback

Multi-Modal Web Search Query Refinement Based on Semi-Supervised Learning

A Survey of Multimodal Composite Editing and Retrieval