Abstract:Aiming to match the person identity between daytime VISible (VIS) and nighttime Near-InfraRed (NIR) images, VIS-NIR re-identification (Re-ID) has attracted increasing attention due to its wide applications in low-light scenes. However, dramatic modality discrepancies between VIS and NIR images lead to a considerable intra-class gap in the feature space, which impacts identity matching. To bridge the modality gap, we propose a Tri-level Modality-information Disentanglement (TMD) to disentangle modality information at the levels of raw image, features distribution and instance features. Our model consists of three key modules, including Style-Aligned Converter (SAC), Two-Steps Wasserstein Loss (TSWL) and Self-supervised Orthogonal Disentanglement (SOD) to handle the modality information at the three levels. Firstly, aiming at reducing modality discrepancy at image-level, the SAC is introduced to generate style-aligned images by the designed style converter and $\mathcal {A}$ -distance learning approach. The SAC can effectively alleviate the style discrepancy between VIS and NIR images with a negligible increase in model complexity. Secondly, considering the heterogeneity of VIS and NIR feature distribution caused by the structure- and style-misaligned raw images, we propose the TSWL to decrease the VIS-NIR gap at distribution-level by two distribution alignment steps. Specifically, after generating style-consistent images, we eliminate modality-related discrepancy by aligning the distribution between structure-aligned original and generated VIS/NIR images and bridge the modality-unrelated gap by aligning the style-consistent generated VIS-NIR images. Thirdly, focusing on further reducing the modality discrepancy at instance-level, the SOD is presented to construct orthogonal constraints between the extracted modality- and identity-related features. Since the modality-related factors are disentangled from the instance features, the proposed TMD efficiently learns the modality-unrelated and identity-discriminative representations, which are productive to conduct person Re-ID task on the VIS-NIR images. Comprehensive experiments are carried out on two cross-modality pedestrian Re-ID datasets to demonstrate the effectiveness of TMD.

Survey of Cross-Modal Person Re-Identification from a Mathematical Perspective

A NEW PARADIGM FOR CROSS-MODALITY PERSON RE-IDENTIFICATION

RGB-IR Person Re-identification by Cross-Modality Similarity Preservation

A Similarity Inference Metric for RGB-Infrared Cross-Modality Person Re-identification

Cross-Modality Person Re-Identification Based on Heterogeneous Center Loss and Non-Local Features

Mix-Modality Person Re-Identification: A New and Practical Paradigm

Inter-Modality Similarity Learning for Unsupervised Multi-Modality Person Re-Identification

Fine-Grained Cross-Modality Person Re-Identification Based on Mutual Prediction Learning

Deep learning-based person re-identification methods: A survey and outlook of recent works

Bridging the Gap: Multi-level Cross-modality Joint Alignment for Visible-infrared Person Re-identification

Discover Cross-Modality Nuances for Visible-Infrared Person Re-Identification

Cross-Modality Transformer With Modality Mining for Visible-Infrared Person Re-Identification

Visible-Infrared Person Re-Identification: A Comprehensive Survey and a New Setting

Beyond Intra-modality: A Survey of Heterogeneous Person Re-identification

Cross-modality person re-identification via channel-based partition network

Learning Modality-Specific Representations for Visible-Infrared Person Re-Identification

Tri-Level Modality-Information Disentanglement for Visible-Infrared Person Re-Identification

Person Re-Identification in Special Scenes Based on Deep Learning: A Comprehensive Survey

UnifiedSC: a unified framework via collaborative optimization for multi-task person re-identification

Adaptive multi-task learning for cross domain and modal person re-identification

Heterogeneous Test-Time Training for Multi-Modal Person Re-identification