Abstract:In most E-commerce platforms, whether the displayed items trigger the user’s interest largely depends on their most eye-catching multimodal content. Consequently, increasing efforts focus on modeling multimodal user preference, and the pressing paradigm is to incorporate complete multimodal deep features of the items into the recommendation module. However, the existing studies ignore the mismatch problem between multimodal feature extraction (MFE) and user interest modeling (UIM) . That is, MFE and UIM have different emphases. Specifically, MFE is migrated from and adapted to upstream tasks such as image classification. In addition, it is mainly a content-oriented and non-personalized process, while UIM, with its greater focus on understanding user interaction, is essentially a user-oriented and personalized process. Therefore, the direct incorporation of MFE into UIM for purely user-oriented tasks, tends to introduce a large number of preference-independent multimodal noise and contaminate the embedding representations in UIM. This paper aims at solving the mismatch problem between MFE and UIM, so as to generate high-quality embedding representations and better model multimodal user preferences. Towards this end, we develop a novel model, m ultimodal e ntity g raph c ollaborative f iltering, short for MEGCF. The UIM of the proposed model captures the semantic correlation between interactions and the features obtained from MFE, thus making a better match between MFE and UIM. More precisely, semantic-rich entities are first extracted from the multimodal data, since they are more relevant to user preferences than other multimodal information. These entities are then integrated into the user-item interaction graph. Afterwards, a symmetric linear Graph Convolution Network (GCN) module is constructed to perform message propagation over the graph, in order to capture both high-order semantic correlation and collaborative filtering signals. Finally, the sentiment information from the review data are used to fine-grainedly weight neighbor aggregation in the GCN, as it reflects the overall quality of the items, and therefore it is an important modality information related to user preferences. Extensive experiments demonstrate the effectiveness and rationality of MEGCF. 1

From Abstract to Details

Click-Through Rate Prediction Algorithm Based on Modeling of Implicit High-Order Feature Importance

Contrastive Intra- and Inter-Modality Generation for Enhancing Incomplete Multimedia Recommendation

Click-Through Rate Prediction with Multi-Modal Hypergraphs

Adversarial Multimodal Representation Learning for Click-Through Rate Prediction

GUME: Graphs and User Modalities Enhancement for Long-Tail Multimodal Recommendation

Balancing Efficiency and Effectiveness: An LLM-Infused Approach for Optimized CTR Prediction

Graph Based Long-Term And Short-Term Interest Model for Click-Through Rate Prediction

CMBF: Cross-Modal-Based Fusion Recommendation Algorithm

OptMSM: Optimizing Multi-Scenario Modeling for Click-Through Rate Prediction

Consumer Intention Recognition and Behavior Prediction of Social E-commerce Users Based on Multimodal Fusion

MEGCF: Multimodal Entity Graph Collaborative Filtering for Personalized Recommendation

Multimodal Conditioned Diffusion Model for Recommendation

A Collaborative Ensemble Framework for CTR Prediction

DMBIN: A Dual Multi-behavior Interest Network for Click-Through Rate Prediction Via Contrastive Learning

MCRF: Enhancing CTR Prediction Models Via Multi-channel Feature Refinement Framework

Triple Modality Fusion: Aligning Visual, Textual, and Graph Data with Large Language Models for Multi-Behavior Recommendations

CETN: Contrast-enhanced Through Network for CTR Prediction

Beyond Co-occurrence: Multi-modal Session-based Recommendation

ContentCTR: Frame-level Live Streaming Click-Through Rate Prediction with Multimodal Transformer

Multi-scale and Multi-Channel Neural Network for Click-Through Rate Prediction.