Reducing Intrinsic and Extrinsic Data Biases for Moment Localization with Natural Language

Jiong Yin,Liang Li,Jiehua Zhang,Chenggang Yan,Lei Zhang,Zunjie Zhu
DOI: https://doi.org/10.1145/3581783.3612357
2023-01-01
Abstract:Moment Localization with Natural Language (MLNL) aims to locate the target moment from an untrimmed video by a linguistic query. Recent works reveal the severe data bias problem in MLNL and point out that the multi-modal content may not be understood by fitting the timestamp distribution. In this paper, we study the data biases on the intrinsic and extrinsic aspects: the former is mainly caused by the ambiguity of the moment boundary and the information imbalance between input and output; The latter results from the long-tail distribution of moments in MLNL datasets. To alleviate this, we propose a hybrid multi-modal debiasing network with temporal consistency constraint for MLNL. Specifically, we first design the multi-temporal Transformer to mitigate the ambiguity of boundary by integrating frame-wise features into segment-wise and dynamically matching with moment boundaries. Then, we introduce the temporal consistency constraint that highlights the action information in complex moment content to overcome the intrinsic bias from information imbalance.Furthermore, we design the hybrid linguistic activating module with external knowledge to relieve the extrinsic bias, which introduces a prior guidance to focus the discriminative information from the tail samples. Extensive experiments on three public datasets demonstrate that our model outperforms the existing methods.
What problem does this paper attempt to address?