VSRNet: End-to-end Video Segment Retrieval with Text Query

Xiao Sun,Xiang Long,Dongliang He,Shilei Wen,Zhouhui Lian
DOI: https://doi.org/10.1016/j.patcog.2021.108027
IF: 8
2021-01-01
Pattern Recognition
Abstract:•We propose a novel framework that combines both video retrieval and segment localization into one network, and the joint training improves the performance of each task.•We introduce a text-aligned attention mechanism to efficiently generate temporal proposal and a collaborative ranking strategy to improve the performance of video segment retrieval.•Extensive experiments conducted on DiDeMo and ActivityNet Captions demonstrate the superiority of our method in VSR task.
What problem does this paper attempt to address?