User-Guided Clustering for Video Segmentation on Coarse-Grained Feature Extraction

Xinhui Peng,Rui Li,Jilong Wang,Hao Shang
DOI: https://doi.org/10.1109/access.2019.2946889
IF: 3.9
2019-01-01
IEEE Access
Abstract:Video segmentation is the task of temporally dividing a video into semantic sections, which are typically based on a specific concept or a theme that is usually defined by the users intention. However, previous studies of video segmentation have that far not taken a users intention into consideration. In this paper, a two-stage user-guided video segmentation framework has been presented, including dimension reduction and temporal clustering. During the dimension reduction stage, a coarse granularity feature extraction is conducted by a deep convolutional neural network pre-trained on ImageNet. In the temporal clustering stage, the information of the users intention is utilized to segment videos on time domain with a proposed operator, which calculates the similarity distance between dimension reduced frames. To provide more insight into the videos, a hierarchical clustering method that allows users to segment videos at different granularities is proposed. Evaluation on Open Video Scene Detection(OVSD) dataset shows that the average F-score achieved by the proposed method is 0.72, even coarse-grained feature extraction is adopted. The evaluation also demonstrated that the proposed method can not only produce different segmentation results according to the users intention, but it also produces hierarchical segmentation results from a low level to a higher abstraction level.
What problem does this paper attempt to address?