Abstract:Group re-identification (re-ID) aims to match groups with the same people under different cameras, mainly involves the challenges of group members and layout changes well. Most existing methods usually use the k-nearest neighbor algorithm to update node features to consider changes in group membership, but these methods cannot solve the problem of group layout changes. To this end, we propose a novel vision transformer based random walk framework for group re-ID. Specifically, we design a vision transformer based on a monocular depth estimation algorithm to construct a graph through the average depth value of pedestrian features to fully consider the impact of camera distance on group members relationships. In addition, we propose a random walk module to reconstruct the graph by calculating affinity scores between target and gallery images to remove pedestrians who do not belong to the current group. Experimental results show that our framework is superior to most methods.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are two major challenges in **group re - ID (group re - identification)**: **group layout change** and **group member change**. Specifically: 1. **Group layout change**: Due to the limitations of different camera perspectives, the relative positions of group members in the group may be significantly different. Such changes make it difficult for methods based on fixed layouts to accurately match the same group. 2. **Group member change**: Group members may frequently join or leave the group, which further increases the difficulty of matching. Most of the existing methods usually use the k - nearest - neighbor algorithm to update node features to take into account the change of group members, but these methods cannot fundamentally solve the problem of group layout change. To solve these problems, the author proposes a random walk framework based on Vision Transformer, which specifically includes the following two main innovation points: - **Vision Transformer based on monocular depth estimation**: By embedding the depth values of pedestrians into the Vision Transformer, a graph structure is constructed, thereby fully considering the influence of camera distance on the relationship between group members. - **Random walk module**: By calculating the affinity scores between the target image and the library image, the graph structure is reconstructed, and pedestrians not belonging to the current group are removed, thereby effectively solving the problems of group member and layout changes. The experimental results show that this framework performs excellently on three group re - identification datasets and is superior to most of the existing methods. ### Formula display Some of the formulas involved in the paper are as follows: - **Random walk operation**: \[ y(t + 1)=W y(t) \] where \(y(t)\) is the vector of similarity scores between the probe image and all library images at the \(t\)-th random walk iteration, and \(W\) is the normalized similarity matrix. - **Normalized similarity matrix**: \[ W(i, j)=\frac{\exp(S(i, j))}{\sum_{j \neq i} \exp(S(i, j))} \] where \(S(i, j)\) is the matrix of similarity scores between the probe sequence and the library image. - **Attention weight calculation**: \[ a_{ij}=\text{softmax}(e_{ij})=\frac{\exp(e_{ij})}{\sum_{(i, k) \in E_s} \exp(e_{ik})} \] These formulas ensure that the model can effectively handle the complex changes in group re - identification tasks.

Vision Transformer based Random Walk for Group Re-Identification

Person Re-identification Based on Transform Algorithm

RETRACTED CHAPTER: Person Re-identification Based on Transform Algorithm

Contribution-Based Multi-Stream Feature Distance Fusion Method with ${k}$ -Distribution Re-Ranking for Person Re-Identification

Deep Siamese Network with Multi-level Similarity Perception for Person Re-identification

Re-Identifying Pedestrians Via Part Based Method

The Research of Group Re-identification from Multiple Cameras

A Video Is Worth Three Views: Trigeminal Transformers for Video-Based Person Re-Identification

Person Re-identification based on Robust Features in Open-world

Generalizable Person Re-Identification via Viewpoint Alignment and Fusion

Group Re-Identification: Leveraging and Integrating Multi-Grain Information.

Uncertainty Modeling for Group Re-Identification

A Novel Attention-Driven Framework for Unsupervised Pedestrian Re-identification with Clustering Optimization

DIMGNet: A Transformer-based Network for Pedestrian Reidentification with Multi-granularity Information Mutual Gain

Vision transformer with multiple granularities for person re-identification

Disentangled body features for clothing change person re-identification

Video-based person re-identification with complementary local and global features using a graph transformer

Person Re-Identification with Effectively Designed Parts

TransReID: Transformer-based Object Re-Identification

Discriminative Spatial Feature Learning for Person Re-Identification

View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network