CSMB-VSS: video scene segmentation with cosine similarity matrix

Zeyu Chen,Xinbo Wang,Ji Wang,Yi Zhang,Xiang Cao
DOI: https://doi.org/10.1007/s11042-023-17985-0
IF: 2.577
2024-01-07
Multimedia Tools and Applications
Abstract:Video scene segmentation is a crucial step in video structural analysis, which divides a long video into discrete scenes, each consisting of a series of semantically coherent shots. The purpose of video scene segmentation is to identify the locations of scene boundaries in a shot sequence. Existing algorithms primarily use token classification methods. However, given the small size of current video scene segmentation datasets and the abundance of redundant, scene-irrelevant information in video embeddings, this approach lacks prior knowledge. This makes the learning process uninterpretable and difficult to control. To address this issue, we propose a cosine similarity matrix-based video scene segmentation (CSMB-VSS) algorithm, which leverages the relationship between video scene segmentation and shot similarity as prior information and shows significant optimization results. First, we use self-supervised learning to map shot features to the scene space for feature adjustment, and propose dynamic programming + nearest neighbor or clustering methods to generate pseudo-scenes for training. Then, we generate a similarity matrix based on the adjusted features and use a convolutional neural network to mine the typical patterns of scene boundaries around the diagonal of the similarity matrix. On the official MovieNet-SSeg video scene segmentation dataset, the CSMB-VSS method achieves an average precision (AP) that is 3.4 higher than the state-of-the-art (SOTA). It is worth noting that this paper explored different ways of using the similarity matrix in scene boundary detection, and found that each method was suitable for different feature adjustment methods. The paper provides a detailed analysis of this.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?