Abstract:This paper presents a novel approach for unsupervised video summarization using reinforcement learning. It aims to address the existing limitations of current unsupervised methods, including unstable training of adversarial generator-discriminator architectures and reliance on hand-crafted reward functions for quality evaluation. The proposed method is based on the concept that a concise and informative summary should result in a reconstructed video that closely resembles the original. The summarizer model assigns an importance score to each frame and generates a video summary. In the proposed scheme, reinforcement learning, coupled with a unique reward generation pipeline, is employed to train the summarizer model. The reward generation pipeline trains the summarizer to create summaries that lead to improved reconstructions. It comprises a generator model capable of reconstructing masked frames from a partially masked video, along with a reward mechanism that compares the reconstructed video from the summary against the original. The video generator is trained in a self-supervised manner to reconstruct randomly masked frames, enhancing its ability to generate accurate summaries. This training pipeline results in a summarizer model that better mimics human-generated video summaries compared to methods relying on hand-crafted rewards. The training process consists of two stable and isolated training steps, unlike adversarial architectures. Experimental results demonstrate promising performance, with F-scores of 62.3 and 54.5 on TVSum and SumMe datasets, respectively. Additionally, the inference stage is 300 times faster than our previously reported state-of-the-art method.

Crowd Aware Summarization of Surveillance Videos by Deep Reinforcement Learning

Creating Personalized Video Summaries Via Semantic Event Detection

Learning User Interest with Improved Triplet Deep Ranking and Web-Image Priors for Topic-Related Video Summarization.

Video Summarisation by Classification with Deep Reinforcement Learning

A GAN Based Video Summarization Method with Representation Loss

Progressive Reinforcement Learning for Video Summarization

Video Summarization through Reinforcement Learning with a 3D Spatio-Temporal U-Net

Action Parsing-Driven Video Summarization Based on Reinforcement Learning

Unsupervised Video Summarization via Reinforcement Learning and a Trained Evaluator

Event-based Large Scale Surveillance Video Summarization.

Be Relevant, Non-Redundant, and Timely: Deep Reinforcement Learning for Real-Time Event Summarization.

Category Driven Deep Recurrent Neural Network for Video Summarization

Video Summarization with Long Short-term Memory

Ultrasound Video Summarization using Deep Reinforcement Learning

Video Summarization Generation Model Based on Transformer and Deep Reinforcement Learning

Spatial Attention Model‐modulated Bi‐directional Long Short‐term Memory for Unsupervised Video Summarisation

Deep Attentive Video Summarization with Distribution Consistency Learning

Edge-Cloud Collaborative Streaming Video Analytics with Multi-agent Deep Reinforcement Learning

Deep Semantic and Attentive Network for Unsupervised Video Summarization

User-Ranking Video Summarization with Multi-Stage Spatio-Temporal Representation.

Unsupervised Video Summarization with a Convolutional Attentive Adversarial Network