A long video caption generation algorithm for big video data retrieval.

Songtao Ding,Shiru Qu,Yuling Xi,Shaohua Wan
DOI: https://doi.org/10.1016/j.future.2018.10.054
IF: 7.307
2019-01-01
Future Generation Computer Systems
Abstract:Videos captured by people are often tied to certain important moments of their lives. But with the era of big data coming, the time required to retrieval and watch can be daunting. In this paper, novel techniques are proposed for the application of long video segmentation, which can effectively shorten the retrieval time. The motion extent of long video is detected by the improved of the spatio-temporal interest points (STIPs) detection algorithm. After that, the superframe segmentation of the filtered long video is performed to gain the interesting clip of long video. In the selection of keyframes, the region of interest is constructed by the use of the STIP already obtained on the video clips, and the saliency detection of these regions of interest is utilized to screen out video keyframes. Finally, we generate the video captions by adding attention vectors to the traditional LSTM. Our method is benchmarked on the VideoSet dataset, and evaluated by the BLEU, Meteor and Rouge.
What problem does this paper attempt to address?