TaoHighlight: Commodity-Aware Multi-Modal Video Highlight Detection in E-Commerce

Zhaoyu Guo,Zhou Zhao,Weike Jin,Dazhou Wang,Ruitao Liu,Jun Yu
DOI: https://doi.org/10.1109/tmm.2021.3087001
IF: 7.3
2022-01-01
IEEE Transactions on Multimedia
Abstract:In e-commerce, product related video is important content to introduce product characteristics and attract consumers. Especially in the recommendation system of e-commerce platform, video highlight detection methods are usually adopted to capture the most attractive clips for showing to consumers, so as to improve the click through rate of products. However, the effect of the current research methods applied to the actual scene is not satisfactory. Compared with other video understanding tasks, video highlight detection is relatively abstract and subjective, and it is difficult to make accurate judgment only by using visual information. Consequently, we put forward multi-modal video highlight detection task, which introduces video related linguistic information as supervised information. And we propose a graph-based commodity-aware model to solve multi-modal video highlight detection in e-commerce scene. Our model consists of multi-modal highlight detection stage and graph-based fine-tuning stage, in which we adopt graph aggregation method to fuse multi-source natural language information and introduce effective visual feature composition method for graph convolution network based highlight detection. Besides, we release the largest e-commerce video highlight detection dataset, TaoHighlight, in which the videos and related data are collected from Taobao e-commerce platform. Our model achieves state-of-art in all separate categories and overall dataset of TaoHighlight, which shows the superiority of our model.
What problem does this paper attempt to address?