Improving Action Recognition with the Graph-Neural-Network-based Interaction Reasoning

Wu Luo,Chongyang Zhang,Xiaoyun Zhang,Haiyan Wu
DOI: https://doi.org/10.1109/VCIP47243.2019.8965768
2019-01-01
Abstract:Recent human action recognition methods mainly model a two-stream or 3D convolution deep learning network, with which humans spatial-temporal features can be exploited and utilized effectively. However, due to the ignoring of interaction exploiting, most of these methods cannot get good enough performance. In this paper, we propose a novel action recognition framework with Graph Convolutional Network (GCN) based Interaction Reasoning: Objects and discriminative scene patches are detected using an object detector and class active mapping (CAM), respectively; and then a GCN is introduced to model the interaction among the detected objects and scene patches. Evaluation of two widely used video action benchmarks shows that the proposed work can achieve comparable performance: the accuracy up to 43.6% at EPIC Kitchen, and 47.0% at VLOG benchmark without using optical flow, respectively.
What problem does this paper attempt to address?