Story-driven Video Editing

Zheng Wang,Jianguo Li,Yu-Gang Jiang
DOI: https://doi.org/10.1109/tmm.2020.3037461
IF: 7.3
2021-01-01
IEEE Transactions on Multimedia
Abstract:This paper proposes a novel multimedia task: story-driven video editing. Given a story paragraph, this task aims to retrieve related video segments from a gallery of collected video segments and compose them into a video sequence by the storyline order. Our proposed baseline solution consists of three modules: a retrieval module, which returns lists of candidate segments for all query sentences in the story paragraph using an object-aware sentence-segment matching method; a sequence candidate proposal module, which aggregates the retrieved segment sets into a sequence proposal by the submodular optimization method; a sorting module, which arranges the candidates according to the storyline of the paragraph using the Sinkhorn network. We build a benchmark for this task including a reorganized version of the ActivityNet Captions dataset, a well-defined quantitative metric called Evaluation of Segment-to-Sequence Matching (ESSM) for measuring the difference between the generated video segment sequence and the ground truth. Quantitative results of the proposed baseline solution are reported. We hope this new task and benchmark will bring broad research attention and push forward a lot of novel online short video editing applications.
computer science, information systems,telecommunications, software engineering
What problem does this paper attempt to address?