Learning robust representations using recurrent neural networks for skeleton based action classification and detection

Hongsong Wang,Liang Wang
DOI: https://doi.org/10.1109/ICMEW.2017.8026278
2017-07-01
Abstract:Recently, skeleton based action recognition gains more popularity due to affordable depth sensors and real-time skeleton estimation algorithms. Previous Recurrent Neural Networks (RNN) based approaches focus on modeling spatial configuration of skeletons and temporal evolution of body joints. There are certain intrinsic characteristics of the skeleton based actions. For example, the starting point may be varied, an action can be observed at arbitrary viewpoints and the skeletons are noisy. To this end, we present a novel end-to-end architecture based on RNN to learn robust representations from raw skeletons. The architecture includes three new layers, i.e., starting point transformation layer, viewpoint transformation layer and spatial dropout layer, which address the corresponding three problems, respectively. We apply the proposed method to two different tasks: action classification and detection. Experiments on two large-scale datasets (NTU RGB+D and PKU-MMD) show the superiority of our model. Specially, for action detection, our results are more than 33.4% higher the previous results.
What problem does this paper attempt to address?