Localizing Action Moments with Dual-Guidance Augmented Discriminability

Qiantai Chen,Meiqing Wang,Hang Cheng,Fei Chen,Danni Luo,Yuxin Dong
DOI: https://doi.org/10.1109/icipca61593.2024.10708788
2024-01-01
Abstract:Temporal action detection (TAD) aims at locating the action boundaries and recognizing their categories among video action clips. However, vague boundary predictions suffer from quick switches of short action moments. At the same time, due to the vulnerable boundaries cheated by similar video backgrounds altering instead of the motion, the model's fine-grained discrimination is highly reliant on the background. To mitigate these issues, we present a novel dual-guidance graph-based architecture, dubbed as dual-guidance self-augmented graph network (DSGN). Specifically, we first exploit an effective expansion approach, inserting a full zero gap sequence between the original feature and the enhanced feature to eliminate the effect and increase the localization accuracy of short action boundaries. Then, a relational dual-aggregation method is designed to integrate two levels of context representations, namely, the local fragment-level feature and the global video-level feature, into the target proposal-level feature, based on their attention correlations according to the self-attention mechanism, to augment the destination proposal clip. We demonstrate that our model outperforms other state-of-the-art methods experimentally on HACS, ActivityNet vI.3 and FineActions.
What problem does this paper attempt to address?