SGLP-Net: Sparse Graph Label Propagation Network for Weakly-Supervised Temporal Action Localization

Xiaoyao Wu,Yonghong Song
DOI: https://doi.org/10.1007/978-981-99-8073-4_12
2024-01-01
Abstract:The present weakly-supervised methods for Temporal Action Localization are primarily responsible for capturing the temporal context. However, these approaches have limitations in capturing semantic context, resulting in the risk of ignoring snippets that are far apart but sharing the same action categories. To address this issue, we propose an action label propagation network utilizing sparse graph networks to effectively explore both temporal and semantic information in videos. The proposed SGLP-Net comprises two key components. One is the multi-scale temporal feature embedding module, a novel method that extracts both local and global temporal features of the videos during the initial stage using CNN and self-attention and serves as a generic module. The other is an action label propagation mechanism, which uses graph networks for feature aggregation and label propagation. To avoid the issue of excessive feature completeness, we optimize training using sparse graph convolutions. Extensive experiments are conducted on THUMOS14 and ActivityNet1.3 benchmarks, among which advanced results demonstrate the superiority of the proposed method. Code can be found at https://github.com/xyao-wu/SGLP-Net .
What problem does this paper attempt to address?