Learning Multiscale Hierarchical Attention for Video Summarization

Wencheng Zhu,Jiwen Lu,Yucheng Han,Jie Zhou
DOI: https://doi.org/10.1016/j.patcog.2021.108312
IF: 8
2022-01-01
Pattern Recognition
Abstract:In this paper, we propose a multiscale hierarchical attention approach for supervised video summarization. Different from most existing supervised methods which employ bidirectional long short-term memory networks, our method exploits the underlying hierarchical structure of video sequences and learns both the short-range and long-range temporal representations via a intra-block and a inter-block attention. Specifically, we first separate each video sequence into blocks of equal length and employ the intrablock and inter-block attention to learn local and global information, respectively. Then, we integrate the frame-level, block-level, and video-level representations for the frame-level importance score prediction. Next, we conduct shot segmentation and compute shot-level importance scores. Finally, we perform key shot selection to produce video summaries. Moreover, we extend our method into a two-stream framework, where appearance and motion information is leveraged. Experimental results on the SumMe and TVSum datasets validate the effectiveness of our method against state-of-the-art methods. (c) 2021 Elsevier Ltd. All rights reserved.
What problem does this paper attempt to address?