Action-Transformer for Action Recognition in Short Videos

Yumeng Cai,Guoyong Cai,Jin Cai
DOI: https://doi.org/10.1109/ICICIP53388.2021.9642184
2021-01-01
Abstract:Action recognition methods are mostly based on a 3-Dimensional (3D) Convolution Network which have some limitations in practice, e.g. redundant parameters, big memory consumed and low performance. In this paper, a new convolution-free model called action-transformer is proposed to address the mentioned problems. The model proposed is mainly composed of three modules: spatial-temporal transformatio...
What problem does this paper attempt to address?