Weakly-Supervised Temporal Action Localization with Regional Similarity Consistency

Haoran Ren,Hao Ren,Hong Lu,Cheng Jin
DOI: https://doi.org/10.1007/978-3-031-27077-2_6
2023-01-01
Abstract:The weakly-supervised temporal action localization task aims to train a model that can accurately locate each action instance in the video using only video-level class labels. The existing methods take into account the information of different modalities (primarily RGB and Flow), and present numerous multi-modal complementary methods. RGB features are obtained by calculating appearance information, which are easy to be disrupted by the background. On the contrary, Flow features are obtained by calculating motion information, which are usually less disrupted by the background. Based on this phenomenon, we propose a Regional Similarity Consistency (RSC) constraint between these two modalities to suppress the disturbance of background in RGB features. Specifically, we calculate the regional similarity matrices of RGB and Flow features, and impose the consistency constraint through L-2 loss. To verify the effectiveness of our method, we integrate the proposed RSC constraint into three recent methods. The comprehensive experimental results show that the proposed RSC constraint can boost the performance of these methods, and achieve the state-of-the-art results on the widely-used THUMOS14 and ActivityNet1.2 datasets.
What problem does this paper attempt to address?