Two-Stream Network with 3D Common-Specific Framework for RGB-D Action Recognition

Xiaolei Qin,Yongxin Ge,Jinyuan Feng,Yida Chen,Liuwei Zhan,Xuchu Wang,Yuangan Wang
DOI: https://doi.org/10.1109/smartworld-uic-atc-scalcom-iop-sci.2019.00159
2019-01-01
Abstract:This paper presents a novel end-to-end network called TSN-3DCSF, which completes the task of RGB-D action recognition in video. Considering depth information can well express the relationship between different body parts, which is very helpful in action recognition, we employ it together with RGB information as inputs in our framework. Despite the characteristics of these two modalities are quite different, they have consistent semantic information and extracting the common and specific features is very meaningful for action recognition. Unlike most works which obtain temporal information by optical flow, our approach utilizes 3D convolution to build a layer to extract temporal information and common-specific feature simultaneously, which enhances the accuracy and reduces the amount of computation. Extensive experiments on three widely used RGB-D action datasets show that our method achieves comparable performance to the state-of-the-art methods.
What problem does this paper attempt to address?