Weakly-supervised Temporal Action Localization with Adaptive Clustering and Refining Network

Hao Ren,Wu Ran,Xingson Liu,Haoran Ren,Hong Lu,Rui Zhang,Cheng Jin
DOI: https://doi.org/10.1109/icme55011.2023.00177
2023-01-01
Abstract:Weakly-supervised temporal action localization task aims to localize temporal boundaries of action instances by using only video-level labels. Existing methods primarily adopt Multi-Instance-Learning (MIL) scheme to handle this task. The effectiveness of MIL scheme depends heavily on the selection of top-k action snippets, which is unstable and requires manual tuning. To address these deficiencies, we propose an Adaptive Clustering and Refining Network (ACRNet). Specifically, we present an action-aware clustering strategy that is adaptable and requires no manual tuning to separate action and background snippets of diverse videos based on intra-class activation distribution. And a cluster refining step is included to eliminate false action snippets by considering inter-class activation distribution, which greatly improves robustness and localization accuracy. Extensive experiments on THUMOS14, ActivityNet 1.2&1.3 benchmarks show that our method achieves state-of-the-art performance.
What problem does this paper attempt to address?