Self-Supervised Motion Perception for Spatiotemporal Representation Learning

Chang Liu,Yuan Yao,Dezhao Luo,Yu Zhou,Qixiang Ye
DOI: https://doi.org/10.1109/tnnls.2022.3160860
IF: 14.255
2022-01-01
IEEE Transactions on Neural Networks and Learning Systems
Abstract:In this study, we propose a novel pretext task and a self-supervised motion perception (SMP) method for spatiotemporal representation learning. The pretext task is defined as video playback rate perception, which utilizes temporal dilated sampling to augment video clips to multiple duplicates of different temporal resolutions. The SMP method is built upon discriminative and generative motion perception models, which capture representations related to motion dynamics and appearance from video clips of multiple temporal resolutions in a collaborative fashion. To enhance the collaboration, we further propose difference and convolution motion attention (MA), which drives the generative model focusing on motion-related appearance, and leverage multiple granularity perception (MG) to extract accurate motion dynamics. Extensive experiments demonstrate SMP's effectiveness for video motion perception and state-of-the-art performance of self-supervised representation models upon target tasks, including action recognition and video retrieval. Code for SMP is available at github.com/yuanyao366/SMP.
computer science, artificial intelligence, theory & methods,engineering, electrical & electronic, hardware & architecture
What problem does this paper attempt to address?