Truncated Attention-Aware Proposal Networks with Multi-Scale Dilation for Temporal Action Detection
Ping Li,Jiachen Cao,Li Yuan,Qinghao Ye,Xianghua Xu
DOI: https://doi.org/10.1016/j.patcog.2023.109684
IF: 8
2023-05-18
Pattern Recognition
Abstract:Detecting actions temporally in untrimmed videos is very challenging, and it accomplishes action classification and localization simultaneously. Capturing the relations among action proposals (i.e., candidate video segments) is of vital importance. While there have been several attempts to encode such relations, they neglect the adverse effects of those irrelevant or negative relations among proposals. Besides, there is a crucial fact that action durations are flexible in videos, which has not been well explored. For the former, we develop a truncated attention mechanism that learns positive proposal relations by dynamically adjusting edge weights of proposal nodes in a graph, and construct the proposal network model using graph convolution networks to suppress disadvantageous relations of proposal pairs by truncating negative attention scores. For the latter, we devise a light multi-scale dilation module shared by all proposals to handle different action durations by enlarging temporal receptive field, thus capturing temporal context to increase the representation capacity of proposals. Unifying these considerations, we present the Multi-scale Dilation based Truncated Attention Proposal Network (MD-TAPN) model for temporal action detection. Our model achieves state-of-the-art performances of detecting actions on two benchmark databases, and especially it outperforms the most competitive method by a significant gain of 3.6% mAP at tIoU0.5 on THUMOS14.
computer science, artificial intelligence,engineering, electrical & electronic