Higher-order Network for Action Recognition

Hu Kai,Shao Jie,Raj Bhiksha,Bao Yixin,Xue Xiangyang
2019-01-01
Abstract: Capturing spatiotemporal contexts is an essential topic in action recognition. In this paper, we present the higher-order architecture to learn position-varying contextual information using higher-order structures. The design of the higher-order architecture is based on the hypothesis that the spatiotemporal contexts are sensitive to space-time positions, but follow the same learnable pattern at different positions. We test our method on four benchmark datasets for action recognition: Kinetics-400, Something-Something V1, Something-Something V2, and Charades. Using only RGB mode inputs, our method achieves results on par with or better than the current state-of-the-art methods. Codes will be made publicly available.
What problem does this paper attempt to address?