User-Ranking Video Summarization with Multi-Stage Spatio-Temporal Representation.

Siyu Huang,Xi Li,Zhongfei Zhang,Fei Wu,Junwei Han
DOI: https://doi.org/10.1109/TIP.2018.2889265
2019-01-01
Abstract:Video summarization is a challenging task, mainly due to the difficulties in learning complicated semantic structural relations between videos and summaries. In this paper, we present a novel supervised video summarization scheme based on threestage deep neural networks. The scheme takes a divide-andconquer strategy to resolve the complicated task of 3D video summarization into a set of easy and flexible computational subtasks, and then to sequentially perform 2D CNNs, 1D CNNs, and LSTM to address the subtasks in an hierarchical fashion. The hierarchical modeling of spatio-temporal structure leads to high performance and efficiency. In addition, we propose a simple but effective user-ranking method to cope with the labeling subjectivity problem of user-created video summarization, leading to the labeling quality refinement for robust supervised learning. Experimental results show that our approach outperforms the state-of-the-art video summarization methods on two benchmark datasets.
What problem does this paper attempt to address?