Abstract:Person re-identification (re-ID) via 3D skeleton data is a challenging task with significant value in many scenarios. Existing skeleton-based methods typically assume virtual motion relations between all joints, and adopt average joint or sequence representations for learning. However, they rarely explore key body structure and motion such as gait to focus on more important body joints or limbs, while lacking the ability to fully mine valuable spatial-temporal sub-patterns of skeletons to enhance model learning. This paper presents a generic Motif guided graph transformer with Combinatorial skeleton prototype learning (MoCos) that exploits structure-specific and gait-related body relations as well as combinatorial features of skeleton graphs to learn effective skeleton representations for person re-ID. In particular, motivated by the locality within joints' structure and the body-component collaboration in gait, we first propose the motif guided graph transformer (MGT) that incorporates hierarchical structural motifs and gait collaborative motifs, which simultaneously focuses on multi-order local joint correlations and key cooperative body parts to enhance skeleton relation learning. Then, we devise the combinatorial skeleton prototype learning (CSP) that leverages random spatial-temporal combinations of joint nodes and skeleton graphs to generate diverse sub-skeleton and sub-tracklet representations, which are contrasted with the most representative features (prototypes) of each identity to learn class-related semantics and discriminative skeleton representations. Extensive experiments validate the superior performance of MoCos over existing state-of-the-art models. We further show its generality under RGB-estimated skeletons, different graph modeling, and unsupervised scenarios.

What problem does this paper attempt to address?

This paper attempts to solve the problem of human Person Re - Identification (re - ID) based on 3D skeleton data. Specifically, the author points out several shortcomings of existing methods: 1. **Assume virtual motion relationships**: Existing methods usually assume that there are virtual motion relationships between all joints and use average joint or sequence representations for learning, but rarely explore key body structures and motions (such as gait) to focus on more important body joints or limbs. 2. **Lack of mining spatio - temporal sub - patterns**: Existing methods lack the ability to fully mine the valuable spatio - temporal sub - patterns in the skeleton, which limits the learning effect of the model. To solve these problems, the author proposes a new method - **Motif Guided Graph Transformer with Combinatorial Skeleton Prototype Learning (MoCos)**. This method improves the effect of human re - identification in the following ways: ### Method overview #### 1. Motif Guided Graph Transformer (MGT) MGT uses hierarchical structure motifs and gait - collaborative motifs to guide joint - relationship learning, thereby capturing richer skeletal patterns. Specifically: - **Hierarchical Structure Motif (HSM)**: By defining neighbor relationships of different orders, it captures multi - level dependencies between joints. - **Gait - Collaborative Motif (GCM)**: Focuses on the local and global motion relationships between upper - and lower - limb joints to enhance the learning of gait features. The formula is expressed as follows: \[ A_m(i,j) = \begin{cases} 1 & \text{if } j \in \bigcup_{k = 1}^m N_k(i) \\ 0 & \text{otherwise} \end{cases} \] where \( m\in\{1,2,3\} \), \( A_m\in\mathbb{R}^{J\times J} \) represents the m - order HSM matrix, and \( N_k(i) \) represents the k - order neighbor index of the i - th joint node. \[ B_m(i,j) = \begin{cases} 1 & \text{if } i\in I_m, j\in\bigcup_{k = 1}^2 I_k, j\neq i \\ 0 & \text{otherwise} \end{cases} \] where \( m\in\{1,2\} \), \( B_1, B_2\in\mathbb{R}^{J\times J} \) represent the GCM matrices of the upper and lower limbs, and \( I_1 \) and \( I_2 \) represent the index sets of the upper - and lower - limb joint nodes respectively. #### 2. Combinatorial Skeleton Prototype Learning (CSP) CSP generates combinatorial representations of sub - skeletons and sub - trajectory fragments through random masking, thereby learning more representative skeleton features. The specific steps are as follows: - **Sub - skeleton representation**: Generate sub - skeleton representations by randomly masking joint nodes. \[ \hat{v}_t=\frac{1}{N_S}\sum_{j = 1}^J x_j h_t^j \] where \( x_j\sim\text{Bernoulli}(1 - p_s) \), and \( N_S=\sum_{j = 1}^J x_j \) represents the number of unmasked nodes. - **Sub - trajectory fragment representation**: Generate sub - trajectory fragment representations by randomly masking consecutive skeleton frames. \[ V=\frac{1}{N_T}\sum_{t = 1}^f m_t\hat{v}_t \] where \( m_t\sim\text{Bernoulli}(1 - p_t) \), \( N_T

Motif Guided Graph Transformer with Combinatorial Skeleton Prototype Learning for Skeleton-Based Person Re-Identification

Skeleton Prototype Contrastive Learning with Multi-Level Graph Relation Modeling for Unsupervised Person Re-Identification

Multi-Level Graph Encoding with Structural-Collaborative Relation Learning for Skeleton-Based Person Re-Identification

Hierarchical Skeleton Meta-Prototype Contrastive Learning with Hard Skeleton Mining for Unsupervised Person Re-Identification

A Self-Supervised Gait Encoding Approach with Locality-Awareness for 3D Skeleton Based Person Re-Identification

SimMC: Simple Masked Contrastive Learning of Skeleton Representations for Unsupervised Person Re-Identification

Exploring High-Order Spatio–Temporal Correlations from Skeleton for Person Re-Identification

Purely neuroendoscopic transventricular management of cystic craniopharyngiomas

Spatial-Temporal Correlation and Topology Learning for Person Re-Identification in Videos

Adversarial learning-based skeleton synthesis with spatial-channel attention for robust gait recognition

SkeletonGait: Gait Recognition Using Skeleton Maps

Multimodal People Re-Identification Using 3D Skeleton, Depth, and Color Information

GaitPT: Skeletons Are All You Need For Gait Recognition

Tran-GCN: A Transformer-Enhanced Graph Convolutional Network for Person Re-Identification in Monitoring Videos

Multi-Modal Transformer with Skeleton and Text for Action Recognition

Pose-Aided Video-based Person Re-Identification via Recurrent Graph Convolutional Network

Disentangling Modality and Posture Factors: Memory-Attention and Orthogonal Decomposition for Visible-Infrared Person Re-Identification

Graph-aware transformer for skeleton-based action recognition

Attention-based Shape and Gait Representations Learning for Video-based Cloth-Changing Person Re-Identification

SkeleTR: Towrads Skeleton-based Action Recognition in the Wild

3Mformer: Multi-order Multi-mode Transformer for Skeletal Action Recognition