Cross-modal Video Moment Retrieval Based on Enhancing Significant Features

YANG Jinfu,LIU Yubin,SONG Lin,YAN Xue
DOI: https://doi.org/10.11999/JEIT211101
2022-01-01
Abstract:With the continuous development of video acquisition equipment and technology, the number of videos has grown rapidly. It is a challenging task in video retrieval to find target video moments accurately in massive videos. Cross-modal video moment retrieval is to find a moment matching the query from the video database. Existing works mostly focus on matching the text with the moment, while ignoring the context content in the adjacent moment. As a result, there exists the problem of insufficient expression of feature relation. In this paper, a novel moment retrieval network is proposed, which highlights the significant features through residual channel attention. At the same time, a temporal adjacent network is designed to capture the context information of the adjacent moment. Experimental results show that the proposed method achieves better performance than the mainstream candidate matching based and video-text features relation based methods.
What problem does this paper attempt to address?