Spatial-Temporal Data Augmentation Based on LSTM Autoencoder Network for Skeleton-Based Human Action Recognition

Juanhui Tu,Hong Liu,Fanyang Meng,Mengyuan Liu,Runwei Ding
DOI: https://doi.org/10.1109/icip.2018.8451608
2018-10-01
Abstract:Data augmentation is known to be of crucial importance for the generalization of RNN-based methods of skeleton-based human action recognition. Traditional data augmentation methods artificially adopt various transformations merely in spatial domain, which lack effective temporal representation. This paper extends traditional Long Short-Term Memory (LSTM) and presents a novel LSTM autoencoder network (LSTM-AE) for spatial-temporal data augmentation. In the LSTM-AE, the LSTM network preserves the temporal information of skeleton sequences, and the autoencoder architecture can automatically eliminate irrelevant and redundant information. Meanwhile, a regularized cross-entropy loss is defined to guide the LSTM-AE to learn more suitable representations of skeleton data. Experimental results on the currently largest NTU RGB+D dataset and public SmartHome dataset verify that the proposed model outperforms the state-of-the-art methods, and can be integrated with most of the RNN-based action recognition models easily.
What problem does this paper attempt to address?