Context-Aware Relational Reasoning for Video Chunks and Frames Overlapping in Language-Based Moment Localization

Hafiza Sadia Nawaz,Daming Shi,Munaza Nawaz
DOI: https://doi.org/10.1016/j.neucom.2024.128224
IF: 6
2024-01-01
Neurocomputing
Abstract:The language-based moment localization (LBML) goal is to locate the moment that corresponds to the input query, and output is the moment that matches with the input query. Due to erroneous correlations between various modalities, currently available methods for LBML frequently fail to distinguish between similar but perplexing overlap moments in an untrimmed video. In addition, long videos are incomprehensible, where the visual overlap during localization is challenging to interpret. In order to localize the correct moment, this paper attempts to identify critical video chunks and frames overlaps that cause network errors. We provide context-aware relational reasoning for video chunks and frames overlapping in language-based moment localization in untrimmed videos. We call our network Useful Overlap Moments Rectifier Network (UOMR-Net). Our UOMR-Net consists of three significant modules: prior to extracting the adjacent frames, we call it video chunks, the Query-Based Filtration module first identifies the useful and useless overlap video chunks and frames. It then refines the video chunk by combining it with the query global feature representation to get the semantics of the query that matches with the video chunk. Second, the Scene Context Overlap Distinguisher module it identifies which frame and video chunk has greater association with the input query, and further consist of two modules: (1) a video chunk overlapping separator and (2) frame overlapping separator. Third, Moment Localization and Contrastive Learning module that explains the context-aware relational reasoning behind the overlapping of the moments, and give us moment starting and ending boundaries as well. The Charades-STA, TaCos, and Activity-Net caption datasets demonstrate that our framework outperforms cutting-edge methods.
What problem does this paper attempt to address?