Hierarchical Long-Short Transformer for Group Activity Recognition

Yan Zhuang,Zhaofeng He,Longteng Kong,Ming Lei
DOI: https://doi.org/10.1007/978-3-031-18913-5_18
2022-01-01
Abstract:Group activity recognition is a challenging task in computer vision, which needs to comprehensively model the diverse spatio-temporal relations among individuals and generate group representation. In this paper, we propose a novel group activity recognition approach, named Hierarchical Long-Short Transformer (HLSTrans). Based on Transformer, it both considers long- and short-range relationship among individuals via Long-Short Transformer Blocks. Moreover, we build a hierarchical structure in HLSTrans by stacking such blocks to obtain abundant individual relations in multiple scales. By long- and short-range relation modeling in hierarchical mode, HLSTrans is able to enhance the representation of individuals and groups, leading to better recognition performance. We evaluate the proposed HLSTrans on Volleyball and VolleyTactic datasets, and the experimental results demonstrate state-of-the-art performance.
What problem does this paper attempt to address?