Abstract:Visible-infrared person re-identification aims to match the identity of the same person from different modalities. The main challenge is the modality difference between visible and infrared images. Most existing methods mainly use generative adversarial networks to generate compensatory images of the corresponding modality to reduce the modality difference, or design diverse two-stream networks to learn global feature representations and extract globally shared features. However, due to the substantial difference between visible and infrared modalities, the created pseudo-modalities often struggle to effectively bridge the gap between modalities and tend to introduce noise. The extracted modality-shared features typically exhibit weak discriminative capability, inevitably leading to the loss of critical discriminative features related to person identity and a lack of robustness to noisy images. To tackle these challenges, we introduce a modality synergy alignment learning network. This network incorporates a novel data augmentation technique known as SliceMix, which mixes random sections of cross-modality images to synthesize a new sample that exhibits both discriminative to identity and robust to noise, thereby facilitating the learning of modality-invariant feature representations. By adjusting the mixing ratio, mixed modalities can be generated flexibly to minimize the impact of modality imbalance. Additionally, a modality alignment module is introduced to ensure similarity within the modality class and accentuate the differences between modalities. Moreover, we propose a data augmentation method called random channel grayscale, which enhances the network's robustness to color changes and expands data diversity. Comprehensive experiments on mainstream datasets, including SYSU-MM01 and RegDB, demonstrated that our method significantly improves the performance of cross-modality retrieval.

Image-to-video person re-identification using three-dimensional semantic appearance alignment and cross-modal interactive learning

Deep Siamese Network with Multi-level Similarity Perception for Person Re-identification

Joint Uneven Channel Information Network with Blend Metric Loss for Person Re-Identification

Spatial-Temporal Correlation and Topology Learning for Person Re-Identification in Videos

Ssn3d: Self-Separated Network To Align Parts For 3d Convolution In Video Person Re-Identification

Temporal Complementarity-Guided Reinforcement Learning for Image-to-Video Person Re-Identification

Densely Semantically Aligned Person Re-Identification

Dual Attention Matching Network for Context-Aware Feature Sequence based Person Re-Identification

Cross-modality person re-identification via modality-synergy alignment learning

A multi-branch attention and alignment network for person re-identification

Deeply-Learned Part-Aligned Representations for Person Re-identification.

Spatial and Temporal Mutual Promotion for Video-Based Person Re-Identification.

Information complementary attention-based multidimension feature learning for person re-identification

Multi-Stream Refining Network for Person Re-Identification

Multi-camera Handoff for Person Re-Identification

Joint Color-irrelevant Consistency Learning and Identity-aware Modality Adaptation for Visible-infrared Cross Modality Person Re-identification.

Cross-Modality Person Re-identification with Memory-Based Contrastive Embedding

Appearance and Motion Enhancement for Video-Based Person Re-Identification

Improving Description-based Person Re-identification by Multi-granularity Image-text Alignments

Person Re-identification Based on Body Segmentation.

Person re-identification with fusion of hand-crafted and deep pose-based body region features