An Adaptive Video Clip Sampling Approach for Enhancing Query-Based Moment Retrieval in Videos.

Lingdu Kong,Tieying Li,Xiaochun Yang,Shengzhi Han,Bin Wang
DOI: https://doi.org/10.1007/978-3-031-30675-4_28
2023-01-01
Abstract:Query-based moment retrieval aims to localize the most relevant moment in an untrimmed video according to the given natural language query. Existing retrieval models require the same length for easy training and use. Therefore, videos with different lengths are pre-processed using the fixed sampling method. As a result, the longer the video, the more video clips are lost, thus affecting the accuracy of retrieval. We observed the fixed sampling method causes two accuracy issues, including missing clips and sparse clips. In this paper, we propose an adaptive video clip sampling method including resampling missing clips and enhancing sparse sampled clips to increase the retrieval accuracy. Resampling missing clips is used to address situations in which annotated clips are completely lost during fixed sampling. Enhancing sparse sampled clips aims to prevent the clips containing the same semantics from being too sparse. Our approach first obtains multiple video features through the adaptive sampling methods based on the backbone networks. Then we propose a consistency loss maintenance method to learn the semantics of adaptive sampled features. The extensive experiments on three real datasets demonstrate the effectiveness of our proposed method, especially for long videos.
What problem does this paper attempt to address?