Graph-based temporal action co-localization from an untrimmed video

Le Wang,Changbo Zhai,Qilin Zhang,Wei Tang,Nanning Zheng,Gang Hua
DOI: https://doi.org/10.1016/j.neucom.2020.12.126
IF: 6
2021-04-01
Neurocomputing
Abstract:<p>We present an efficient approach for temporal action co-localization (TACL), which means to simultaneously localize all action instances in an untrimmed video. Compared with the conventional instance-by-instance action localization, TACL can exploit the contextual and temporal relationships among action instances to reduce the localization ambiguities. Motivated by the strong relational modeling capability of graph neural networks, we propose a Graph-based Temporal Action Co-Localization (G-TACL) method. By considering each action proposal as a node, G-TACL effectively aggregates contextual and temporal features from related action proposals to jointly recognize and localize all action instances in a single shot. Moreover, we introduce a novel multi-level consistency evaluator to measure the relatedness between any two action proposals. This is achieved by considering their high-level contextual similarities, low-level temporal coincidences and feature correlations. We exploit the Gated Recurrent Units (GRUs) to iteratively update the features of each node, which are then used to regress the temporal boundaries of action proposals and finally achieve action co-localization. Experimental results on three datasets, <em>i.e.</em>, THUMOS14, MEXaction2 and ActivityNet v1.3 datasets demonstrate that our G-TACL is superior or comparable to the state-of-the-arts.</p>
computer science, artificial intelligence
What problem does this paper attempt to address?