Video Highlight Detection Via Region-Based Deep Ranking Model

Yifan Jiao,Tianzhu Zhang,Shucheng Huang,Bin Liu,Changsheng Xu
DOI: https://doi.org/10.1142/S0218001419400019
IF: 1.261
2019-01-01
International Journal of Pattern Recognition and Artificial Intelligence
Abstract:The video highlight detection task is to localize key elements (moments of user's major or special interest) in a video. Most of the existing highlight detection approaches extract features from the video segment as a whole without considering the difference of local features spatially. In spatial extent, not all regions are worth watching because some of them only contain the background of the environment without human or other moving objects, especially when there is lots of clutter in the background. To deal with this issue, we propose a novel region-based model which can automatically localize the key elements in a video without any extra supervised annotations. Specifically, the proposed model produces position-sensitive score maps for local regions in the spatial dimension of the video segment, and then aggregates all position-wise scores with position-pooling operation. The regions with higher response values will be extracted as key elements. Thus more effective features of the video segment are obtained to predict the highlight score. The proposed position-sensitive scheme can be easily integrated into an endto-end fully convolutional network which aims to update parameters via stochastic gradient descent method in the backward propagation to improve the robustness of the model. Extensive experimental results on the YouTube and SumMe datasets demonstrate that the proposed approach achieves significant improvement over state-of-the-art methods.
What problem does this paper attempt to address?