Multi-Scale Structure-Aware Network for Weakly Supervised Temporal Action Detection

Wenfei Yang,Tianzhu Zhang,Zhendong Mao,Yongdong Zhang,Qi Tian,Feng Wu
DOI: https://doi.org/10.1109/tip.2021.3089361
IF: 10.6
2021-01-01
IEEE Transactions on Image Processing
Abstract:Weakly supervised temporal action detection has better scalability and practicability than fully supervised action detection in reality deployment. However, it is difficult to learn a robust model without temporal action boundary annotations. In this paper, we propose an en-to-end Multi-Scale Structure-Aware Network (MSA-Net) for weakly supervised temporal action detection by exploring both the global structure information of a video and the local structure information of actions. The proposed SA-Net enjoys several merits. First, to localize actions with different durations, each video is encoded into feature representations with different temporal scales. Second, based on the multi-scale feature representation, the proposed model has designed two effective structure modeling mechanisms including global structure modeling and local structure modeling, which can effectively learn discriminative structure aware representations for robust and complete action detection. To the best of our knowledge, this is the first work to fully explore the global and local structure information in a unified deep model for weakly supervised action detection. And extensive experimental results on two benchmark datasets demonstrate that the proposed MSA-Net performs favorably against state-of-the-art methods.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?