Deep Feature Enhancing and Selecting Network for Weakly Supervised Temporal Action Localization

Jiaruo Yu,Yongxin Ge,Xiaolei Qin,Ziqiang Li,Sheng Huang,Feiyu Chen
DOI: https://doi.org/10.1016/j.jvcir.2021.103276
IF: 2.887
2021-01-01
Journal of Visual Communication and Image Representation
Abstract:Weakly supervised temporal action localization is a challenging computer vision problem that uses only video-level labels and lacks the supervision of temporal annotations. In this task, the majority of existing methods usually identify the most discriminative snippets and ignore other relevant snippets. To address this problem, we propose a deep feature enhancing and selecting network. It generates multiple masks for both capturing more complete temporal interval of actions and keeping its high classification accuracy. After that, we further propose a novel selection strategy to balance the influence of multiple masks and improve the model performance. In the experiments, we evaluate the proposed method on the THUMOS’14 and ActivityNet datasets, and the results show the effectiveness of our approach for weakly supervised temporal action localization.
What problem does this paper attempt to address?