Spatial-Temporal Asynchronous Normalization for Unsupervised 3D Action Representation Learning

Mengyuan Liu,Youneng Bao,Yongsheng Liang,Fanyang Meng
DOI: https://doi.org/10.1109/lsp.2022.3144898
2022-01-01
IEEE Signal Processing Letters
Abstract:Unsupervised 3D action representation learning from skeleton sequences has attracted increasing attention in recent years. Existing methods have successfully applied autoencoder network to learn 3D action representation by reconstructing original skeleton sequence. However, these methods ignore motion cues thus suffer from distinguishing actions especially with similar shape information and slightly different motion information. Instead of reconstructing original skeleton sequence, we learn distinctive 3D action representation with autoencoder network by reconstructing normalized motion sequence extracted from original input. To obtain the normalized motion sequence, we specifically design a novel spatial-temporal asynchronous normalization (STAN) method, which normalizes original skeleton sequence in two steps. First, STAN reduces redundant temporal information and extracts motion sequence by subtracting mean value along the temporal dimension. Second, STAN further normalizes the motion sequence along the spatial dimension and generates normalized motion sequence that suffers less from the effect of different human body shapes. Extensive experiments on large scale NTU RGB+D 60 and NTU RGB+D 120 datasets verify the effectiveness of our proposed STAN method, which achieves comparative results with state-of-the-art methods, and also outperforms alternative normalization methods.
What problem does this paper attempt to address?