Spatiotemporal Perturbation Based Dynamic Consistency for Semi-supervised Temporal Action Detection

Lin Wang,Yan Song,Rui Yan,Xiangbo Shu
DOI: https://doi.org/10.1007/978-3-030-98358-1_15
2022-01-01
Abstract:Temporal action detection usually relies on huge tagging costs to achieve significant performance. Semi-supervised learning, where only a small amount of data are annotated in the training set, can help reduce the burden of labeling. However, the existing action detection models will inevitably learn inductive bias from limited labeled data and hinder the effective use of unlabeled data in semi-supervised learning. To this end, we propose a generic end-to-end framework for Semi-Supervised Temporal Action Detection (SS-TAD). Specifically, the framework is based on the teacher-student structure that leverages the consistency between unlabeled data and their augmentations. To achieve this, we propose a dynamic consistency loss by employing an attention mechanism to alleviate the prediction bias of the model, so it can make full use of the unlabeled data. Besides, we design a concise yet valid spatiotemporal feature perturbation module to learn robust action representations. Experiments on THUMOS14 and ActivityNet v1.2 demonstrate that our method significantly outperforms the start-of-the-art semi-supervised methods and is even comparable to the fully-supervised methods.
What problem does this paper attempt to address?