Abstract:<p>Multi-video summarization is an effective tool for users to browse multiple videos. In this paper, multi-video summarization is formulated as a graph analysis problem and a dynamic graph convolutional network is proposed to measure the importance and relevance of each video shot in its own video as well as in the whole video collection. Two strategies are proposed to solve the inherent class imbalance problem of video summarization task. Moreover, we propose a diversity regularization to encourage the model to generate a diverse summary. Extensive experiments are conducted, and the comparisons are carried out with the state-of-the-art video summarization methods, the traditional and novel graph models. Our method achieves state-of-the-art performances on two standard video summarization datasets. The results demonstrate the effectiveness of our proposed model in generating a representative summary for multiple videos with good diversity.</p>

What problem does this paper attempt to address?

The problems that this paper attempts to solve are several key challenges in multi - video summarization: 1. **Complex Relationship Modeling**: The multi - video summarization task requires measuring the importance and relevance of each video segment in its own video and in the entire video collection. Traditional single - video summarization methods cannot effectively handle the similarities and differences between multiple videos. 2. **Class Imbalance Problem**: In the video summarization task, the number of segments selected in the final summary (the minority class) is far less than the number of unselected segments (the majority class). This class imbalance will cause the model to be inclined to predict the majority class, thus affecting the quality of the summary. 3. **Diversity Requirement**: In order to generate representative summaries, the model needs to avoid selecting repetitive content and be encouraged to generate diverse summaries. ### Specific Solutions To solve the above problems, the paper proposes a Dynamic Graph Convolutional Network (mvsDGCN), and its main contributions are as follows: 1. **Dynamic Graph Structure**: - Use a dynamic graph adjacency matrix to capture the dependency relationships between video segments that change with the learning process. - Update the adjacency matrix in each layer of the graph convolutional network to better represent the dynamic topological structure of the graph. 2. **Class Imbalance Strategy**: - Propose two strategies to solve the class imbalance problem: - **Bagging Strategy**: Divide the training data into multiple balanced data sets through under - sampling techniques and train multiple weak classifiers. Use ensemble learning for classification in the testing phase. - **Penalty Loss Strategy**: Add a penalty term to the loss function to increase the penalty for misclassifying the minority class. The formula is as follows: \[ L(y, \hat{y})=\lambda \frac{\sum_{i = 1}^{N}1(y_{i}^{0}=0)g(y_{i},\hat{y}_{i})}{N(y_{i}^{0}=0)}+(1 - \lambda)\frac{\sum_{i = 1}^{N}1(y_{i}^{0}=1)g(y_{i},\hat{y}_{i})}{N(y_{i}^{0}=1)}, \quad \lambda>0.5 \] where \(g(y_{i},\hat{y}_{i}) = -\sum_{j = 0}^{\text{NumOfClass}-1}y_{ij}\ln(\hat{y}_{ij})\) is the cross - entropy loss. 3. **Diversity Regularization**: - Introduce a diversity regularization term to encourage the model to generate diverse summaries. The formula is as follows: \[ L_{\text{diversity}}=\frac{1}{N_{s}(N_{s}-1)}\sum_{v = i}^{N_{s}}\sum_{u = v + 1}^{N_{s}}\exp\left(-\frac{(1-\frac{H_{M}(v)\cdot H_{M}(u)}{\vert H_{M}(v)\vert\vert H_{M}(u)\vert})^{2}}{2}\right) \] where \(H_{M}(v)\) is the output feature vector of node \(v\) in the \(M\) - th layer. ### Summary The core objective of this paper is to solve the problems of complex relationship modeling, class imbalance, and diversity requirements in the multi - video summarization task through the dynamic graph convolutional network and multiple optimization strategies, so as to generate representative and diverse summaries.

Dynamic graph convolutional network for multi-video summarization

A Novel Compact Yet Rich Key Frame Creation Method for Compressed Video Summarization

An Unsupervised Video Summarization Method Based on Multimodal Representation.

A GAN Based Video Summarization Method with Representation Loss

Video Summarization Generation Network Based on Dynamic Graph Contrastive Learning and Feature Fusion

An Interactive Personalized Video Summarization Based on Sketches.

Relational Reasoning over Spatial-Temporal Graphs for Video Summarization

Multi-View Video Summarization

Multi-video Summarization Using Complex Graph Clustering and Mining.

Exploring global diverse attention via pairwise temporal relation for video summarization

Feature fusion over hyperbolic graph convolution networks for video summarisation

Multi-modal Summarization for Video-containing Documents

Category Driven Deep Recurrent Neural Network for Video Summarization

Conditional Modeling Based Automatic Video Summarization

Reconstructive Sequence-Graph Network for Video Summarization

Unsupervised Video Summarization with a Convolutional Attentive Adversarial Network

Query-based video summarization with multi-label classification network

Video summarization via knowledge-aware multimodal deep networks

MHSCNet: A Multimodal Hierarchical Shot-aware Convolutional Network for Video Summarization

A Hierarchical Spatial–Temporal Cross-Attention Scheme for Video Summarization Using Contrastive Learning

Deep Semantic and Attentive Network for Unsupervised Video Summarization