A GAN Based Video Summarization Method with Representation Loss

Zhuo Lei,Qiang Yu,Lidan Shou,Shengquan Li,Yunqing Mao
DOI: https://doi.org/10.1145/3652583.3657621
2024-01-01
Abstract:An effective video summary should encapsulate the entire narrative and highlight its most critical content. However, supervised learning heavily relies on labor-intensive and time-consuming manual annotations. To tackle the issue, we propose a Convolutional Attentive Adversarial Network that fundamentally aims to create a deep summarization model in an unsupervised manner. We employ a Generative Adversarial Network (GAN) framework to solve the task, where a generator assigns importance scores to all video frames, while a discriminator distinguishes between score-weighted frame features and the original counterparts. We introduce a novel representative loss function, complemented by adversarial, sparsity, and reconstruction losses, to guide the prediction of frame importance scores. In order to substantiate our proposed method's efficacy, we have conducted extensive experiments across two public benchmark datasets, SumMe and TVSum. The outcomes demonstrate that our approach surpasses other state-of-the-art methods.
What problem does this paper attempt to address?