Video Summarization Method Integrating Universal Demand Elements

jianglei Tong,Xiaolin Gui,Xiaoyu TENG
DOI: https://doi.org/10.2139/ssrn.4747970
2024-01-01
Abstract:Given the lack of linkage between demand analysis and video elements in current video summary algorithms, it is difficult to cover user needs with summary results. In this paper, a video summary method integrating universal demand elements is proposed. This method initially parses coarse-grained user demands into four universal demand elements: characters, locations, representative objects, and events. Subsequently, it employs a feature extraction module designed using ConvLSTM to explore the spatiotemporal relationships of these demand elements within keyframes. Furthermore, the method combines a spatial pyramid and multi-head attention mechanism to design a main-storyline detection module. Additionally, the paper introduces and designs a storyline relevance evaluation metric to assess the similarity between machine-generated and actual summaries in terms of universal demand elements. We evaluated our algorithm on two benchmark datasets and the results demonstrate its feasibility and effectiveness, along with superior transferability.
What problem does this paper attempt to address?