Abstract:<p>Explainable recommendation, which provides explanations about why an item is recommended, has attracted growing attention in both research and industry communities. However, most existing explainable recommendation methods cannot provide multi-model explanations consisting of both textual and visual modalities or adaptive explanations tailored for the user's dynamic preference, potentially leading to the degradation of customers' satisfaction, confidence and trust for the recommender system. On the technical side, Recurrent Neural Network (RNN) has become the most prevalent technique to model dynamic user preferences. Benefit from the natural characteristics of RNN, the hidden state is a combination of long-term dependency and short-term interest to some degrees. But it works like a black-box and the monotonic temporal dependency of RNN is not sufficient to capture the user's short-term interest.</p><p>In this paper, to deal with the above issues, we propose a novel Attentive Recurrent Neural Network (Ante-RNN) with textual and visual fusion for the dynamic explainable recommendation. Specifically, our model jointly learns image representations with textual alignment and text representations with topical attention mechanism in a parallel way. Then a novel dynamic contextual attention mechanism is incorporated into Ante-RNN for modelling the complicated correlations among recent items and strengthening the user's short-term interests. By combining the full latent visual-semantic alignments and a hybrid attention mechanism including topical and contextual attentions, Ante-RNN makes the recommendation process more transparent and explainable. Extensive experimental results on two real world datasets demonstrate the superior performance and explainability of our model.</p>

Visual and Textual Jointly Enhanced Interpretable Fashion Recommendation

Personalized Fashion Recommendation with Visual Explanations Based on Multimodal Attention Network

Visually Explainable Recommendation

Fashion Recommendation on Street Images.

Visually-Aware Personalized Recommendation using Interpretable Image Representations

Visually-aware Recommendation with Aesthetic Features

Enhancing Visual Fashion Recommendations with Users in the Loop

Visually-Aware Fashion Recommendation and Design with Generative Image Models

COURIER: Contrastive User Intention Reconstruction for Large-Scale Visual Recommendation

Multi-modal clothing recommendation model based on large model and VAE enhancement

Merging Visual Features and Temporal Dynamics in Sequential Recommendation.

Interpretable Multimodal Retrieval for Fashion Products.

Learning Fashion Compatibility with Bidirectional LSTMs

Aesthetic-based Clothing Recommendation

Personal Tastes vs. Fashion Trends: Predicting Ratings Based on Visual Appearances and Reviews

Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference

Dynamic attention-based explainable recommendation with textual and visual fusion

Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering

Large Scale Visual Recommendations From Street Fashion Images

A Picture is Worth a Thousand Words: Introducing Visual Similarity into Recommendation

Self-Supervised Vision Transformer for Enhanced Virtual Clothes Try-On