Abstract:Group activity recognition aims to recognize holistic activity in multi-person scene, necessitating consideration of the interactions between actors and their surroundings. It has various applications, such as public surveillance, video analysis. Nonetheless, existing works merely extract scene features as a supplementary component of activity features, failing to adequately explore the interplay between the scene and actors. To address this limitation, this paper proposes a Local–Global Context-Aware Graph Reasoning Model (LG-CAGR), which leverages and reasons through local and global context features to gain deeper insights into group activity within the scene. In particular, we present an innovative feature extraction strategy to harness local location features and global scene attributes, effectively complementing actor features by capturing spatial group topology and determining relative positions between actors. Subsequently, we delve into these features by devising a local–global group reasoning module that deduces pair-wise interactions between actors and scenes within Graph Convolutional Network, comprehensively elucidating correlations between overall scene and local individuals to construct group-level features. Multi-graphs are constructed considering actor's features, scene features as nodes, and interactions as edges. A self-attention graph pooling network is introduced to automatically integrate key actor features and form rich group-level features to recognize group activity. The results on Collective Activity Dataset, Collective Activity Extended Dataset, Volleyball Dataset and Public Life in Public Space dataset have reached 94.0%, 97.7%, 92.7% and 56.1%. Compared with existing methods using the same backbone, we exceeded 1%, 2.1%, 0.3%, and 14.9% respectively, affirming the superiority of the proposed method compared with state-of-the-art methods.

Group Activity Representation Learning with Self-supervised Predictive Coding.

Learning Visual Context for Group Activity Recognition.

Detector-Free Weakly Supervised Group Activity Recognition

Multi-dimensional convolution transformer for group activity recognition

SoGAR: Self-supervised Spatiotemporal Attention-based Social Group Activity Recognition

SPARTAN: Self-supervised Spatiotemporal Transformers Approach to Group Activity Recognition

A Spatio-Temporal Transformer Network for Human Motion Prediction in Human-Robot Collaboration

Human Activities Prediction by Learning Combinatorial Sparse Representations.

Human activity prediction by mapping grouplets to recurrent Self-Organizing Map.

Exploiting Temporal Coherence for Self-Supervised Visual Tracking by Using Vision Transformer

Skeleton-based Group Activity Recognition via Spatial-Temporal Panoramic Graph

Participation-Contributed Temporal Dynamic Model for Group Activity Recognition.

Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition

Latent Embeddings for Collective Activity Recognition

Unveiling group activity recognition: Leveraging Local–Global Context-Aware Graph Reasoning for enhanced actor–scene interactions

Educational tool for hospital-based training in family medicine.

Temporal Transformer Networks with Self-Supervision for Action Recognition.

Progressive Relation Learning for Group Activity Recognition

A unified framework for unsupervised action learning via global-to-local motion transformer

Multi-Level Sequence GAN for Group Activity Recognition

Hyper-STTN: Social Group-aware Spatial-Temporal Transformer Network for Human Trajectory Prediction with Hypergraph Reasoning