Uncertainty Guided Collaborative Training for Weakly Supervised Temporal Action Detection

Wenfei Yang,Tianzhu Zhang,Yongdong Zhang,Feng Wu
DOI: https://doi.org/10.1109/tpami.2022.3200399
IF: 23.6
2022-01-01
IEEE Transactions on Pattern Analysis and Machine Intelligence
Abstract:In weakly supervised (WSAL) and unsupervised temporal action localization (UAL), the target is to simultaneously localize temporal boundaries and identify category labels of actions with only video-level category labels (WSAL) or category numbers in a dataset (UAL) during training. Among existing methods, attention based methods have achieved superior performance in both tasks by highlighting action segments with foreground attention weights. However, without the segment-level supervision on the attention weight learning, the quality of the attention weight hinders the performance of these methods. In this paper, we propose a novel Uncertainty Guided Collaborative Training (UGCT) strategy to alleviate this problem, which mainly includes two key designs: (1) The first design is an online pseudo label generation module, in which the RGB and FLOW streams work collaboratively to learn from each other. (2) The second design is an uncertainty aware learning module, which can mitigate the noise in the generated pseudo labels. These two designs work together to promote the model performance effectively and efficiently by exchanging information between RGB and FLOW streams. Extensive experimental results on two benchmark datasets with three attention based methods demonstrate the effectiveness of the proposed method, e.g, more than 7.0% performance gain for mAP@IoU=0.5 on THUMOS14 dataset.
What problem does this paper attempt to address?