Abstract:Purpose Tags help promote customer engagement on video-sharing platforms. Video tag recommender systems are artificial intelligence-enabled frameworks that strive for recommending precise tags for videos. Extant video tag recommender systems are uninterpretable, which leads to distrust of the recommendation outcome, hesitation in tag adoption and difficulty in the system debugging process. This study aims at constructing an interpretable and novel video tag recommender system to assist video-sharing platform users in tagging their newly uploaded videos. Design/methodology/approach The proposed interpretable video tag recommender system is a multimedia deep learning framework composed of convolutional neural networks (CNNs), which receives texts and images as inputs. The interpretability of the proposed system is realized through layer-wise relevance propagation. Findings The case study and user study demonstrate that the proposed interpretable multimedia CNN model could effectively explain its recommended tag to users by highlighting keywords and key patches that contribute the most to the recommended tag. Moreover, the proposed model achieves an improved recommendation performance by outperforming state-of-the-art models. Practical implications The interpretability of the proposed recommender system makes its decision process more transparent, builds users' trust in the recommender systems and prompts users to adopt the recommended tags. Through labeling videos with human-understandable and accurate tags, the exposure of videos to their target audiences would increase, which enhances information technology (IT) adoption, customer engagement, value co-creation and precision marketing on the video-sharing platform. Originality/value The proposed model is not only the first explainable video tag recommender system but also the first explainable multimedia tag recommender system to the best of our knowledge.

Explainable Deep Learning for Video Recognition Tasks: A Framework & Recommendations

Exploring Explainability in Video Action Recognition

Discriminating Spatial and Temporal Relevance in Deep Taylor Decompositions for Explainable Activity Recognition

Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models

Explaining First Impressions: Modeling, Recognizing, and Explaining Apparent Personality from Videos

Explainable Deep Learning Framework for Human Activity Recognition

Deep Learning for Robust and Explainable Models in Computer Vision

An explainable and efficient deep learning framework for video anomaly detection

Contextual Explainable Video Representation: Human Perception-based Understanding

Explainability of deep learning models in medical video analysis: a survey

Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications

Analysis of Explainers of Black Box Deep Neural Networks for Computer Vision: A Survey

Explainable Deep Learning: A Visual Analytics Approach with Transition Matrices

A Reinforcement Learning Framework for Explainable Recommendation

Interpretable Deep Learning Models: Enhancing Transparency and Trustworthiness in Explainable AI

Explainable AI approaches in deep learning: Advancements, applications and challenges

Explaining Deep Face Algorithms through Visualization: A Survey

Explaining Deep Neural Networks by Leveraging Intrinsic Methods

Interpretable video tag recommendation with multimedia deep learning framework

From Pixels to Words: Leveraging Explainability in Face Recognition through Interactive Natural Language Processing

XAI-FR: Explainable AI-Based Face Recognition Using Deep Neural Networks