A Novel Temporal Channel Enhancement and Contextual Excavation Network for Temporal Action Localization
Zan Gao,Xinglei Cui,Yibo Zhao,Tao Zhuo,Weili Guan,Meng Wang
DOI: https://doi.org/10.1145/3581783.3612167
2023-01-01
Abstract:The temporal action localization (TAL) task aims to locate and classify action instances in untrimmed videos. Most previous methods use classifiers and locators to act on the same feature; thus, the classification and localization processes are relatively independent. Therefore, if the classification results and localization results are fused, there will be a problem that the classification results are correct while the localization results are wrong, resulting in inaccurate final results, and vice versa. To solve this problem, we propose a novel temporal channel enhancement and contextual excavation network (TCN) for the TAL task, which generates robust classification and localization features and refines the final localization results. Specifically, a temporal channel enhancement module is designed to enhance the temporal and channel information of the feature sequence. Then, the temporal semantic contextual excavation module is developed to establish relationships between similar frames. Finally, the features with enhanced contextual information are transferred to a classifier. While executing the classification process, we obtain powerful classification features. Most importantly, with the robust classification features, the final localization features are produced by the refine localization module, which is applied to obtain the final localization results. Extensive experiments show that TCN can outperform all the SOTA methods on the THUMOS14 dataset, and achieves a comparable performance on the ActivityNet1.3 dataset. Compared with ActionFormer (ECCV 2022) and BREM (MM 2022) on the THUMOS14 dataset, the proposed TCN can achieve improvements of 1.8% and 5.0%, respectively.