Dynamic graph convolutional network for multi-video summarization

Jiaxin Wu,Sheng-hua Zhong,Yan Liu
DOI: https://doi.org/10.1016/j.patcog.2020.107382
IF: 8
2020-11-01
Pattern Recognition
Abstract:<p>Multi-video summarization is an effective tool for users to browse multiple videos. In this paper, multi-video summarization is formulated as a graph analysis problem and a dynamic graph convolutional network is proposed to measure the importance and relevance of each video shot in its own video as well as in the whole video collection. Two strategies are proposed to solve the inherent class imbalance problem of video summarization task. Moreover, we propose a diversity regularization to encourage the model to generate a diverse summary. Extensive experiments are conducted, and the comparisons are carried out with the state-of-the-art video summarization methods, the traditional and novel graph models. Our method achieves state-of-the-art performances on two standard video summarization datasets. The results demonstrate the effectiveness of our proposed model in generating a representative summary for multiple videos with good diversity.</p>
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?
The problems that this paper attempts to solve are several key challenges in multi - video summarization: 1. **Complex Relationship Modeling**: The multi - video summarization task requires measuring the importance and relevance of each video segment in its own video and in the entire video collection. Traditional single - video summarization methods cannot effectively handle the similarities and differences between multiple videos. 2. **Class Imbalance Problem**: In the video summarization task, the number of segments selected in the final summary (the minority class) is far less than the number of unselected segments (the majority class). This class imbalance will cause the model to be inclined to predict the majority class, thus affecting the quality of the summary. 3. **Diversity Requirement**: In order to generate representative summaries, the model needs to avoid selecting repetitive content and be encouraged to generate diverse summaries. ### Specific Solutions To solve the above problems, the paper proposes a Dynamic Graph Convolutional Network (mvsDGCN), and its main contributions are as follows: 1. **Dynamic Graph Structure**: - Use a dynamic graph adjacency matrix to capture the dependency relationships between video segments that change with the learning process. - Update the adjacency matrix in each layer of the graph convolutional network to better represent the dynamic topological structure of the graph. 2. **Class Imbalance Strategy**: - Propose two strategies to solve the class imbalance problem: - **Bagging Strategy**: Divide the training data into multiple balanced data sets through under - sampling techniques and train multiple weak classifiers. Use ensemble learning for classification in the testing phase. - **Penalty Loss Strategy**: Add a penalty term to the loss function to increase the penalty for misclassifying the minority class. The formula is as follows: \[ L(y, \hat{y})=\lambda \frac{\sum_{i = 1}^{N}1(y_{i}^{0}=0)g(y_{i},\hat{y}_{i})}{N(y_{i}^{0}=0)}+(1 - \lambda)\frac{\sum_{i = 1}^{N}1(y_{i}^{0}=1)g(y_{i},\hat{y}_{i})}{N(y_{i}^{0}=1)}, \quad \lambda>0.5 \] where \(g(y_{i},\hat{y}_{i}) = -\sum_{j = 0}^{\text{NumOfClass}-1}y_{ij}\ln(\hat{y}_{ij})\) is the cross - entropy loss. 3. **Diversity Regularization**: - Introduce a diversity regularization term to encourage the model to generate diverse summaries. The formula is as follows: \[ L_{\text{diversity}}=\frac{1}{N_{s}(N_{s}-1)}\sum_{v = i}^{N_{s}}\sum_{u = v + 1}^{N_{s}}\exp\left(-\frac{(1-\frac{H_{M}(v)\cdot H_{M}(u)}{\vert H_{M}(v)\vert\vert H_{M}(u)\vert})^{2}}{2}\right) \] where \(H_{M}(v)\) is the output feature vector of node \(v\) in the \(M\) - th layer. ### Summary The core objective of this paper is to solve the problems of complex relationship modeling, class imbalance, and diversity requirements in the multi - video summarization task through the dynamic graph convolutional network and multiple optimization strategies, so as to generate representative and diverse summaries.