Discriminative Feature Learning with Constraints of Category and Temporal for Action Recognition.

Zhize Wu,Shouhong Wan,Peiquan Jin,Lihua Yue
DOI: https://doi.org/10.1007/978-3-319-21963-9_16
2015-01-01
Abstract:Recently, with the availability of the depth cameras, a lot of studies of human action recognition have been conducted on the depth sequences. Motivated by the observations that each pose has its relative location during a complete action sequence, and similar actions have the fine spatio-temporal differences. We propose a novel method to recognize human actions based on the depth information in this paper. Representations of depth maps are learned and reconstructed using a stacked denoising autoencoder. By adding the category and temporal constraints, the learned features are more discriminative, able to capture the subtle but significant differences between actions, and mitigate the nuisance variability of temporal misalignment. Greedy layer-wise training strategy is used to train the deep neural network. Then we employ temporal pyramid matching on the feature representation to generate temporal representation. Finally a linear SVM is trained to classify each sequence into actions. We compare our proposal on MSR Action3D dataset with the previous methods, and the results shown that the proposed method significantly outperforms traditional model, and comparable to, state-of-art action recognition performance. Experimental results also indicate the great power of our model to restore highly noisy input data.
What problem does this paper attempt to address?