Modality correlation-based video summarization
Xingrun Wang,Xiushan Nie,Xingbo Liu,Binze Wang,Yilong Yin
DOI: https://doi.org/10.1007/s11042-020-08690-3
IF: 2.577
2020-03-03
Multimedia Tools and Applications
Abstract:Video summarization is an important technique to help us browse, store, and retrieve a rapidly increasing amount of video data, which extracts frames or shots from the original video. Text information covers important content of a video, and thus a summarization can be generated by exploring the correlation between the frame and text. In this study, we propose a video summarization method based on the modality correlation. With this method, we first learn the correlation between the text and frame in the respective space, and then fuse two correlations to obtain the importance score of each shot. Finally, video shots that have a high importance score are chosen as the video summarization. Compared to previous methods that seldom apply text to generate the video summarization, or only use the latent common information between text and frame, the proposed method fully utilizes not only the latent common but also modality-specific information for a video summarization. Experiments were conducted on the TVSum50 dataset, and the results verify the effectiveness of our proposed approach.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering