Graph Attention Networks Adjusted Bi-LSTM for Video Summarization

Rui Zhong,Rui Wang,Yang Zou,Zhiqiang Hong,Min Hu
DOI: https://doi.org/10.1109/lsp.2021.3066349
2021-01-01
IEEE Signal Processing Letters
Abstract:The high redundancy among keyframes is a critical issue for the prior summarizing methods in dealing with user-created videos. To address the critical issue, we present a Graph Attention Networks (GAT) adjusted Bi-directional Long Short-term Memory (Bi-LSTM) model for unsupervised video summarization. First, the GAT is adopted to transform an image's visual features into higher-level features by the Contextual Features based Transformation (CFT) mechanism. Specifically, a novel Salient-Area-Size-based spatial attention model is presented to extract frame-wise visual features on the observation that humans tend to focus on sizable and moving objects. Second, the higher-level visual features are integrated with semantic features processed by Bi-LSTM to refine the frame-wise probability of being selected as keyframes. Extensive experiments demonstrate that our method outperforms state-of-the-art methods.
What problem does this paper attempt to address?