Abstract:In recent years, the field of video-based person re-identification (re-ID) has conducted in-depth research on how to effectively utilize spatiotemporal clues, which has attracted attention for its potential in providing comprehensive view representations of pedestrians. However, although the discriminability and correlation of spatiotemporal features are often studied, the exploration of the complex relationships between these features has been relatively neglected. Especially when dealing with multi-granularity features, how to depict the different spatial representations of the same person under different perspectives becomes a challenge. To address this challenge, this paper proposes a multi-granularity inter-frame relationship exploration and global residual embedding network specifically designed to solve the above problems. This method successfully extracts more comprehensive and discriminative feature representations by deeply exploring the interactions and global differences between multi-granularity features. Specifically, by simulating the dynamic relationship of different granularity features in long video sequences and using a structured perceptual adjacency matrix to synthesize spatiotemporal information, cross-granularity information is effectively integrated into individual features. In addition, by introducing a residual learning mechanism, this method can also guide the diversified development of global features and reduce the negative impacts caused by factors such as occlusion. Experimental results verify the effectiveness of this method on three mainstream benchmark datasets, significantly surpassing state-of-the-art solutions. This shows that this paper successfully solves the challenging problem of how to accurately identify and utilize the complex relationships between multi-granularity spatiotemporal features in video-based person re-ID.

Global-Local Temporal Representations For Video Person Re-Identification

Joining Features by Global Guidance with Bi-Relevance Trihard Loss for Person Re-Identification

AA-RGTCN: Reciprocal Global Temporal Convolution Network with Adaptive Alignment for Video-Based Person Re-Identification

Gaussian-based Probability Fusion for Person Re-Identification with Taylor Angular Margin Loss

Spatial-Temporal Correlation and Topology Learning for Person Re-Identification in Videos

Person Re-identification Based on Transform Algorithm

Temporal Complementarity-Guided Reinforcement Learning for Image-to-Video Person Re-Identification

Relation-Guided Spatial Attention and Temporal Refinement for Video-Based Person Re-Identification.

MSTN: A Multi-granular Spatial–Temporal Network for video-based person re-identification

Multi-Scale Temporal Cues Learning for Video Person Re-Identification

Cross-Modality Spatial-Temporal Transformer for Video-Based Visible-Infrared Person Re-Identification

Adaptive Graph Representation Learning for Video Person Re-identification

Iterative Local-Global Collaboration Learning Towards One-Shot Video Person Re-Identification.

Spatial-Temporal Attention-aware Learning for Video-based Person Re-identification.

Video-based person re-identification with complementary local and global features using a graph transformer

Multi-granular inter-frame relation exploration and global residual embedding for video-based person re-identification

Person Re-Identification By Video Ranking

Spatial and Temporal Mutual Promotion for Video-Based Person Re-Identification.

Video-Based Person Re-Identification Using Spatial-Temporal Memory Coupling Network

STFE: A Comprehensive Video-Based Person Re-Identification Network Based on Spatio-Temporal Feature Enhancement

Saliency and Granularity: Discovering Temporal Coherence for Video-Based Person Re-Identification