Two-Stream Completeness Modeling for Weakly Supervised Temporal Action Detection

Miao Ma,Xiaoqiu Chen,Mengge Li
DOI: https://doi.org/10.1109/TALE52509.2021.9678609
2021-12-05
Abstract:In the process of dataset construction, the cost is expensive to make frame-wise temporal annotations for videos. Therefore, weakly supervised temporal action detection methods which only leverage video-level action categories annotations during training have become an important research branch. At present, most of the weakly supervised methods adopt feature early fusion, and have two inevitable problems, namely action completeness and background frame interference. Therefore, this paper proposes a method based on the Two-Stream Completeness Modeling (TSCM) network. First, this method separately inputs spatial flow features and temporal flow features into the network to make full use of the characteristics of the two modal features. Second, it employs the modeling of the multibranch complementary completeness to generate as complete action instances as possible. Finally, Angular Center Loss with a Pair of Triplets (ACL-PT) is designed to suppress the interference from background frames. In particular, this paper constructs a temporal action detection dataset (STAD), which is based on learning scenes to explore the effectiveness of our method in real applications. Experimental results show that the proposed TSCM method not only is superior to most mainstream methods in terms of mean Average Precision (mAP) on the THUMOS14 dataset and the ActivityNet1.2 dataset, but also achieves good detection accuracy on the STAD dataset.
Computer Science
What problem does this paper attempt to address?