Learning Explicit Shape And Motion Evolution Maps For Skeleton-Based Human Action Recognition

Hong Liu,Juanhui Tu,Mengyuan Liu,Runwei Ding
DOI: https://doi.org/10.1109/icassp.2018.8462061
2018-01-01
Abstract:Human action recognition based on skeleton sequences has wide applications in human-computer interaction and intelligent surveillance. Although previous methods have successfully applied Long Short-Term Memory(LSTM) networks to model shape evolution of human actions, it still remains a problem to efficiently recognize actions, especially for similar actions from sequential data due to the lack of the details of motion. To solve this problem, this paper presents an improved LSTM-based network to jointly learn explicit long-term shape evolution maps (SEM) and motion evolution maps (MEM). Firstly, human actions are represented as compact SEM and MEM, which mutually compensate. Secondly, these maps are jointly learned by deep LSTM networks to explore high-level temporal dependencies. Then, a weighted aggregate layer (WAL) is designed to aggregate outputs of LSTM networks cross different temporal stages. Finally, deep features of shape and motion are combined by decision level fusion. Experimental results on the currently largest NTU RGB+D dataset and public SmartHome dataset verify that our method significantly outperforms the state-of-the-arts.
What problem does this paper attempt to address?