Complementary Attention Network for Weakly Supervised Temporal Action Localization

Peng Dou,Haifeng Hu
DOI: https://doi.org/10.1007/s11063-023-11156-w
2023-01-01
Abstract:Weakly supervised temporal action localization effectively reduces the expensive cost of manual labeling. Some works are implemented based on the attention framework. However, we observe that attention-based methods can only pay attention to local segments, ignoring the dependencies between individual segments. To address this issue, we propose complementary attention network (CAN) to capture dependencies between segments. Specifically, we design a global attention branch and a channel attention branch, the former is used to exploit inter-segment information and the latter is used to enhance video features. Based on the channel attention branch, sparse loss and similarity loss are proposed to identify actions from sparse subsets of video segments and summarize action features, respectively. Combining the above designs, the CAN model effectively optimizes the network and improves the localization accuracy. Our model achieves excellent results on THUMOS14 and ActivityNet1.2 datasets.
What problem does this paper attempt to address?