Abstract:Video summarization is a crucial research area that aims to efficiently browse and retrieve relevant information from the vast amount of video content available today. With the exponential growth of multimedia data, the ability to extract meaningful representations from videos has become essential. Video summarization techniques automatically generate concise summaries by selecting keyframes, shots, or segments that capture the video's essence. This process improves the efficiency and accuracy of various applications, including video surveillance, education, entertainment, and social media. Despite the importance of video summarization, there is a lack of diverse and representative datasets, hindering comprehensive evaluation and benchmarking of algorithms. Existing evaluation metrics also fail to fully capture the complexities of video summarization, limiting accurate algorithm assessment and hindering the field's progress. To overcome data scarcity challenges and improve evaluation, we propose an unsupervised approach that leverages video data structure and information for generating informative summaries. By moving away from fixed annotations, our framework can produce representative summaries effectively. Moreover, we introduce an innovative evaluation pipeline tailored specifically for video summarization. Human participants are involved in the evaluation, comparing our generated summaries to ground truth summaries and assessing their informativeness. This human-centric approach provides valuable insights into the effectiveness of our proposed techniques. Experimental results demonstrate that our training-free framework outperforms existing unsupervised approaches and achieves competitive results compared to state-of-the-art supervised methods.

Unsupervised Video Summarization via Relation-Aware Assignment Learning

An Unsupervised Video Summarization Method Based on Multimodal Representation.

A GAN Based Video Summarization Method with Representation Loss

Learning User Interest with Improved Triplet Deep Ranking and Web-Image Priors for Topic-Related Video Summarization.

Relational Reasoning over Spatial-Temporal Graphs for Video Summarization

Unsupervised Video Summarization with a Convolutional Attentive Adversarial Network

Exploring global diverse attention via pairwise temporal relation for video summarization

Deep Semantic and Attentive Network for Unsupervised Video Summarization

Automatic video knowledge mining for summary generation based on un-supervised statistical learning

Enhancing Video Summarization with Context Awareness

Deep Attentive Video Summarization with Distribution Consistency Learning

Neural Entity Summarization with Joint Encoding and Weak Supervision

Supervised Video Summarization via Multiple Feature Sets with Parallel Attention

Video Summarization with Long Short-term Memory

Conditional Modeling Based Automatic Video Summarization

Reconstructive Sequence-Graph Network for Video Summarization

Unsupervised Video Summarization via Multi-source Features

Unsupervised video summarization framework using keyframe extraction and video skimming

Video Captioning Via Relation-Aware Graph Learning

Learning Multiscale Hierarchical Attention for Video Summarization

Unsupervised video summarization with adversarial graph-based attention network