MUP: Multi-granularity Unified Perception for Panoramic Activity Recognition

Meiqi Cao,Rui Yan,Xiangbo Shu,Jiachao Zhang,Jinpeng Wang,Guo-Sen Xie
DOI: https://doi.org/10.1145/3581783.3612435
2023-01-01
Abstract:Panoramic activity recognition is required to jointly identify multi-granularity human behaviors including individual actions, group activities, and global activities in multi-person videos. Previous methods encode these behaviors hierarchically through multiple stages, which disturb the inherent co-occurrence across multi-granularity behaviors in the same scene. To this end, we propose a novel Multi-granularity Unified Perception (MUP) framework that perceives different granularity behaviors universally to explore the co-occurrence motion pattern via the same parameters in an end-to-end fashion. To be specific, the proposed framework stacks three Unified Motion Encoding (UME) blocks for modeling multiple granularity behaviors with shared parameters. UME block mines intra-relevant and cross-relevant semantics synchronously from input feature sequences via Intra-granularity Motion Embedding (IME) and Cross-granularity Motion Prototyping (CMP). In particular, IME aims to model the interactions among visual features within each granularity based on the attention mechanism. CMP aims to aggregate features across different granularities (i.e., person to group) via several learnable prototypes. Extensive experiments demonstrate that MUP outperforms the state-of-the-art methods on JRDB-PAR and has satisfactory interpretability.
What problem does this paper attempt to address?