Efficient Proposal Generation with U-shaped Network for Temporal Sentence Grounding

Ludan Ruan,Qin Jin
DOI: https://doi.org/10.1145/3469877.3490606
2021-01-01
Abstract:Temporal Sentence Grounding aims to localize the relevant temporal region in a given video according to the query sentence. It is a challenging task due to the semantic gap between different modalities and diversity of the event duration. Proposal generation plays an important role in previous mainstream methods. However, previous proposal generation methods apply the same feature extraction without considering the diversity of event duration. In this paper, we propose a novel temporal sentence grounding model with an U-shaped Network for efficient proposal generation (UN-TSG), which utilizes U-shaped structure to encode proposals of different lengths hierarchically. Experiments on two benchmark datasets demonstrate that with more efficient proposal generation method, our model can achieve the state-of-the-art grounding performance in faster speed and with less computation cost.
What problem does this paper attempt to address?