Abstract:Recent researches have proven that pre-training on large-scale person images extracted from internet videos is an effective way in learning better representations for person re-identification. However, these researches are mostly confined to pre-training at the instance-level or single-video tracklet-level. They ignore the identity-invariance in images of the same person across different videos, which is a key focus in person re-identification. To address this issue, we propose a Cross-video Identity-cOrrelating pre-traiNing (CION) framework. Defining a noise concept that comprehensively considers both intra-identity consistency and inter-identity discrimination, CION seeks the identity correlation from cross-video images by modeling it as a progressive multi-level denoising problem. Furthermore, an identity-guided self-distillation loss is proposed to implement better large-scale pre-training by mining the identity-invariance within person images. We conduct extensive experiments to verify the superiority of our CION in terms of efficiency and performance. CION achieves significantly leading performance with even fewer training samples. For example, compared with the previous state-of-the-art~\cite{ISR}, CION with the same ResNet50-IBN achieves higher mAP of 93.3\% and 74.3\% on Market1501 and MSMT17, while only utilizing 8\% training samples. Finally, with CION demonstrating superior model-agnostic ability, we contribute a model zoo named ReIDZoo to meet diverse research and application needs in this field. It contains a series of CION pre-trained models with spanning structures and parameters, totaling 32 models with 10 different structures, including GhostNet, ConvNext, RepViT, FastViT and so on. The code and models will be made publicly available at <a class="link-external link-https" href="https://github.com/Zplusdragon/CION_ReIDZoo" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address the issue of identity invariance in the task of cross-video person re-identification (Person Re-Identification, ReID). Specifically, existing pre-training methods are mostly limited to instance-level or single-video trajectory-level learning, neglecting the identity invariance of the same person in images across different videos. This identity invariance is crucial for person re-identification. Therefore, the authors propose a Cross-video Identity-cOrrelating pre-traiNing (CION) framework, which models multi-level denoising problems to explore identity correlations in images across videos and introduces an identity-guided self-distillation loss for better large-scale pre-training. ### Main Contributions 1. **Proposing the CION Framework**: This framework explicitly explores the identity invariance in face images extracted from internet videos through progressive multi-level denoising and identity-guided self-distillation, thereby improving the effectiveness of representation learning. 2. **Experimental Validation**: Extensive experiments validate the superiority of CION in terms of efficiency and performance. CION achieves better performance than existing methods with fewer training samples. For example, on the Market1501 and MSMT17 datasets, CION using ResNet50-IBN achieved 93.3% and 74.3% mAP, respectively, while using only 8% of the training samples. 3. **Contributing the ReIDZoo Model Library**: To meet diverse research and application needs, the authors constructed a fully open-source model library, ReIDZoo, which includes 32 CION pre-trained models with different structures and parameters, covering 10 different model architectures such as GhostNet, ConvNext, RepViT, etc. ### Method Overview 1. **Noise Definition**: The authors define a noise concept that comprehensively considers intra-identity consistency and inter-identity distinctiveness. 2. **Progressive Multi-level Denoising**: Through single-trajectory denoising, short-range single-video denoising, and long-range cross-video denoising, the noise in the sample set is gradually reduced to seek better identity correlations. 3. **Identity-guided Self-distillation**: Utilizing identity correlation information, the student network matches the output probability distribution of the teacher network through contrastive learning and self-distillation, thereby better learning identity invariance. ### Experimental Results - **Supervised Person Re-identification**: On the Market1501 and MSMT17 datasets, CION significantly outperforms existing state-of-the-art methods, especially when using fewer training samples. - **Unsupervised Person Re-identification**: In both unsupervised domain adaptation (UDA) and unsupervised learning (USL) settings, CION achieves new state-of-the-art performance, significantly surpassing all previous methods. In summary, this paper effectively addresses the issue of identity invariance in cross-video person re-identification by proposing the CION framework, significantly improving the performance and efficiency of the model.

Cross-video Identity Correlating for Person Re-identification Pre-training

Instance Hard Triplet Loss for In-video Person Re-identification

Person Re-identification Based on Transform Algorithm

Deep Siamese Network with Multi-level Similarity Perception for Person Re-identification

Contribution-Based Multi-Stream Feature Distance Fusion Method with ${k}$ -Distribution Re-Ranking for Person Re-Identification

Contribution-Based Multi-Stream Feature Distance Fusion Method With <inline-formula> <tex-math notation="LaTeX">${k}$ </tex-math></inline-formula>-Distribution Re-Ranking for Person Re-Identification

Joining Features by Global Guidance with Bi-Relevance Trihard Loss for Person Re-Identification

RETRACTED CHAPTER: Person Re-identification Based on Transform Algorithm

Spatial-Temporal Correlation and Topology Learning for Person Re-Identification in Videos

Unleashing Potential of Unsupervised Pre-Training with Intra-Identity Regularization for Person Re-Identification

Inter-Camera Identity Discrimination for Unsupervised Person Re-Identification

Understanding Li diffusion in Li-intercalation compounds.

Unified pre-training with pseudo infrared images for visible-infrared person re-identification

Boosting Person Re-Identification with Viewpoint Contrastive Learning and Adversarial Training

Dense 3D-Convolutional Neural Network for Person Re-Identification in Videos

Scale-invariant siamese network for person re-identification

Multi-level Similarity Perception Network for Person Re-identification

Rethinking the Distribution Gap of Person Re-identification with Camera-Based Batch Normalization

Learning Generalisable Omni-Scale Representations for Person Re-Identification

Cross Domain Knowledge Transfer for Person Re-identification.

AA-RGTCN: Reciprocal Global Temporal Convolution Network with Adaptive Alignment for Video-Based Person Re-Identification