Abstract:RGB-Infrared person re-identification (RGB-IR ReID) aims to associate people across disjoint RGB and IR camera views. Currently, state-of-the-art performance of RGB-IR ReID is not as impressive as that of conventional ReID. Much of that is due to the notorious modality bias training issue brought by the single-modality ImageNet pre-training, which might yield RGB-biased representations that severely hinder the cross-modality image retrieval. This paper makes first attempt to tackle the task from a pre-training perspective. We propose a self-supervised pre-training solution, named Modality-Aware Multiple Granularity Learning (MMGL), which directly trains models from scratch only on multi-modal ReID datasets, but achieving competitive results against ImageNet pre-training, without using any external data or sophisticated tuning tricks. First, we develop a simple-but-effective 'permutation recovery' pretext task that globally maps shuffled RGB-IR images into a shared latent permutation space, providing modality-invariant global representations for downstream ReID tasks. Second, we present a part-aware cycle-contrastive (PCC) learning strategy that utilizes cross-modality cycle-consistency to maximize agreement between semantically similar RGB-IR image patches. This enables contrastive learning for the unpaired multi-modal scenarios, further improving the discriminability of local features without laborious instance augmentation. Based on these designs, MMGL effectively alleviates the modality bias training problem. Extensive experiments demonstrate that it learns better representations (+8.03% Rank-1 accuracy) with faster training speed (converge only in few hours) and higher data efficiency (< 5% data size) than ImageNet pre-training. The results also suggest it generalizes well to various existing models, losses and has promising transferability across datasets. The code will be released a- https://github.com/hansonchen1996/MMGL.

Deep multimodal representation learning for generalizable person re-identification

Deep Multimodal Fusion for Generalizable Person Re-identification

Joining Features by Global Guidance with Bi-Relevance Trihard Loss for Person Re-Identification

Semi-Supervised Domain Generalizable Person Re-Identification

DMM: Dual-Modal Model for Person Re-Identification

Adaptive Domain-Specific Normalization for Generalizable Person Re-Identification

Person30K: A Dual-Meta Generalization Network for Person Re-Identification

Dual Mutual Learning for Cross-modality Person Re-identification

A Unified Deep Semantic Expansion Framework for Domain-Generalized Person Re-identification

Adaptive multi-task learning for cross domain and modal person re-identification

Mutual Distillation Learning For Person Re-Identification

Deep High-Resolution Representation Learning for Cross-Resolution Person Re-Identification

Identity-Seeking Self-Supervised Representation Learning for Generalizable Person Re-identification

Person Reidentification by Multiscale Feature Representation Learning With Random Batch Feature Mask

Self-Supervised Modality-Aware Multiple Granularity Pre-Training for RGB-Infrared Person Re-Identification

Leader-Based Multi-Scale Attention Deep Architecture for Person Re-Identification

Unified Multifaceted Feature Learning for Person Re-Identification

Learning to Adapt Across Dual Discrepancy for Cross-Domain Person Re-Identification

Multi-Scale Explicit Matching and Mutual Subject Teacher Learning for Generalizable Person Re-Identification

Multi-scale Deep Learning Architectures for Person Re-identification

Learning Memory-Augmented Unidirectional Metrics for Cross-modality Person Re-identification