Learning Action Correlation and Temporal Aggregation for Group Representation

Haoting Wang,Kan Li,Xin Niu
DOI: https://doi.org/10.1007/978-3-030-80119-9_53
2021-01-01
Abstract:In this work, we propose a deep graph model for collective activity recognition. Based on person’s visual embedding, we explore action correlation to construct the contextual information through GNN reasoning. Our proposed layer generates local evolution descriptor for each person, which contains action correlation and spatial information. Besides, we design temporal aggregation module to encode them into a meta-action space and then aggregate these descriptors to construct final group representation for collective activity recognition. We conduct experiments on two collective activity recognition datasets (collective activity dataset and volleyball dataset) and achieve 89.6% and 91.3% recognition accuracy respectively, which outperforms the compared state-of-the-art methods. Empirical results on collective recognition demonstrate that the effectiveness of learning action correlation and temporal aggregation for video-level group representation.
What problem does this paper attempt to address?