Query-based video summarization with multi-label classification network

Weifeng Hu,Yu Zhang,Yujun Li,Jia Zhao,Xifeng Hu,Yan Cui,Xuejing Wang
DOI: https://doi.org/10.1007/s11042-023-15126-1
IF: 2.577
2023-03-22
Multimedia Tools and Applications
Abstract:Generic video summarization algorithms are characterized by the uniqueness of the final video summary result, which cannot satisfy the different summary requirements of different users for the same video. This paper addresses the task of query-based video summarization, which takes users’ queries and long videos as inputs and aims to generate a query-based video summary. In this article, we propose a query-based video summarization algorithm with a multi-label classification network (MLC-SUM). Specifically, we treat video summarization as a target-based multi-label classification problem, and predict the correlation between video content and multi-concept labels by inputting convolutional features into a multi-layer perceptron, then use the cross-correlation of the labels to weight the predicted probability. Finally, we select the part of the video content with the highest relevance to the user’s query sentence as the video summary output. Experiments on three common datasets verify the effectiveness and superiority of the proposed algorithm.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?