Group Activity Representation Learning with Self-supervised Predictive Coding.

Longteng Kong,Zhaofeng He,Man Zhang,Yunzhi Xue
DOI: https://doi.org/10.1007/978-3-031-18913-5_16
2022-01-01
Abstract:This paper aims to learn the group activity representation in an unsupervised fashion without manual annotated activity labels. To achieve this, we exploit self-supervised learning based on group predictions and propose a Transformer-based Predictive Coding approach (TransPC), which mines meaningful spatio-temporal features of group activities mere-ly with data itself. Firstly, in TransPC, a Spatial Graph Transformer Encoder (SGT-Encoder) is designed to capture diverse spatial states lied in individual actions and group interactions. Then, a Temporal Causal Transformer Decoder (TCT-Decoder) is used to anticipate future group states with attending to the observed state dynamics. Furthermore, due to the complex group states, we both consider the distinguishability and consistency of predicted states and introduce a jointly learning mechanism to optimize the models, enabling TransPC to learn better group activity representation. Finally, extensive experiments are carried out to evaluate the learnt representation on downstream tasks on Volleyball and Collective Activity datasets, which demonstrate the state-of-the-art performance over existing self-supervised learning approaches with fewer training labels.
What problem does this paper attempt to address?