PcmNet: Position-sensitive Context Modeling Network for Temporal Action Localization

Xin Qin,Hanbin Zhao,Guangchen Lin,Hao Zeng,Songcen Xu,Xi Li
DOI: https://doi.org/10.1016/j.neucom.2022.08.040
IF: 6
2022-01-01
Neurocomputing
Abstract:Temporal action localization, which aims to locate temporal regions where actions take place and recog-nize their corresponding classes in untrimmed real-world videos, is a challenging task. As a critical cue to video understanding, exploiting the video context has become an important strategy to boost the local-ization performance. However, previous methods mainly focus on exploring semantic context which cap-tures the feature similarity among frames or proposals. The temporal position context which is also vital for temporal action localization is less explored. In this paper, we propose a position-sensitive context modeling approach to fuse both semantic and position context for more precise action localization. Specifically, we first propose a position encoding method tailored for temporal action localization on both frame-level and proposal-level, which ensures that the generated position representations can model the distance and chronological relationships among frames or proposals. Then we conduct attention-based context aggregation to produce discriminative features and help with precise boundary detection and proposal evaluation. Our method achieves state-of-the-art performance on two widely used datasets, THUMOS-14 and ActivityNet-1.3, demonstrating the effectiveness and generalizability of our method.(c) 2022 Published by Elsevier B.V.
What problem does this paper attempt to address?