Emotion Knowledge Driven Video Highlight Detection

Fan Qi,Xiaoshan Yang,Changsheng Xu
DOI: https://doi.org/10.1109/tmm.2020.3035285
IF: 7.3
2021-01-01
IEEE Transactions on Multimedia
Abstract:This paper addresses video highlight detection which aims to select a small subset of frames according to user's major or special interest. The performances of conventional methods highly depend on large-scale manually labeled training data which are time-consuming and labor-intensive to collect. To deal with this problem, we trace back to the original problem definition and find that whether a user is interested in a specific video segment heavily depends on human's subjective emotions. Leveraging this insight, we introduce an emotion knowledge driven video detection framework for modeling human's general emotion and inferencing highlight strength. Firstly, we obtain the concept-level representation of the video clip with a front-end network. The concepts are used as nodes to build an emotion-related knowledge graph, and their relationships in the graph are modeled via external public knowledge graphs. Then we adopt Siamese GCNs to model the dependencies between nodes in the graph and propagate messages along the edges. Finally, we compute the emotion-aware representation of the video clip based on the GCN layers and further use it to predict the highlight score. Our framework, including the front-end network, graph convolution layers and the highlight mapping network, can be trained in an end-to-end manner with the constraint of a ranking loss. Experiments on two benchmark datasets show that our proposed method performs favorably against the state-of-the-art methods.
computer science, information systems,telecommunications, software engineering
What problem does this paper attempt to address?