Spatio-temporal Deformable 3D ConvNets with Attention for Action Recognition

Jun Li,Xianglong Liu,Mingyuan Zhang,Deqing Wang
DOI: https://doi.org/10.1016/j.patcog.2019.107037
IF: 8
2020-01-01
Pattern Recognition
Abstract:•We are the first to propose a spatio-temporal deformable 3D convolutions with an attention mechanism (STDA for short).•The proposed module serves as a generic module for many 3D CNNs, and in practice it is only needed to append at the later convolution layer without increasing too much computational cost.•Our attention mechanism can exploit both long-range temporal dependencies across multiple frames and long-distance spatial dependencies inside each frame, and thus helps extract the discriminative global information at both inter-frame level and intra-frame level.•Experiments validate the superior performances and efficiency of the proposed approach.
What problem does this paper attempt to address?