Augmented Skeleton Based Contrastive Action Learning with Momentum LSTM for Unsupervised Action Recognition

Haocong Rao,Shihao Xu,Xiping Hu,Jun Cheng,Bin Hu
DOI: https://doi.org/10.1016/j.ins.2021.04.023
IF: 8.1
2021-08-01
Information Sciences
Abstract:<p>Action recognition via 3D skeleton data is an emerging important topic. Most existing methods rely on hand-crafted descriptors to recognize actions, or perform supervised action representation learning with massive labels. In this paper, we for the first time propose a contrastive action learning paradigm named AS-CAL that exploits different augmentations of <em>unlabeled</em> skeleton sequences to learn action representations in an <em>unsupervised</em> manner. Specifically, we first propose to contrast similarity between augmented instances of the input skeleton sequence, which are transformed with multiple novel augmentation strategies, to learn inherent action patterns ("<em>pattern-invariance</em>") in different skeleton transformations. Second, to encourage learning the pattern-invariance with more consistent action representations, we propose a momentum LSTM, which is implemented as the momentum-based moving average of LSTM based query encoder, to encode long-term action dynamics of the key sequence. Third, we introduce a <em>queue</em> to store the encoded keys, which allows flexibly reusing proceeding keys to build a consistent dictionary to facilitate contrastive learning. Last, we propose a novel representation named Contrastive Action Encoding (CAE) to represent human's action effectively. Empirical evaluations show that our approach significantly outperforms hand-crafted methods by 10-50% Top-1 accuracy, and it can even achieve superior performance to many supervised learning methods<a class="workspace-trigger" href="#fn2"><sup>2</sup></a>.</p>
computer science, information systems
What problem does this paper attempt to address?