Self-attention relational modeling and background suppression for weakly supervised temporal action localization

Jing Wang,Chuanxu Wang
DOI: https://doi.org/10.1117/1.jei.31.6.063019
IF: 0.829
2022-01-01
Journal of Electronic Imaging
Abstract:Weakly supervised temporal action localization aims to locate the start and end boundaries of action instances and recognize the corresponding categories. Classical methods include random erasure, attention mechanism, and cross-temporal graph relationship modeling. Despite their great progress, there are still two challenges: localization integrity and background interference. Therefore, we propose a framework with self-attention relationship modeling and background suppression to address these issues. First, the input features of background frames are suppressed by the filtering module, which prevents interference from background noise. Second, a self-attention mechanism is designed to model the relationship between different segments in the video, which refines action features to encourage smoother temporal classification scores for completeness localization. Finally, under the guidance of classified loss L act, the refined segment features and foreground weights are further combined in an attention-weighted pool to achieve video-level prediction. The algorithm is experimentally verified on THUMOS14 and ActivityNet1.2 datasets and compared with other relevant literature, which proves its feasibility and effectiveness. (c) 2022 SPIE and IS&T
What problem does this paper attempt to address?