Abstract:With the prevalence of dual-mode cameras in surveillance systems, visible-infrared person re-identification (VI-ReID) has become an emerging topic. Existing studies of VI-ReID roughly fall into three categories: straightforwardly extracting features, improving loss functions, and conducting visible-infrared modality generation. The generation methods avoid the shortcoming of the former two that training models are generally vulnerable to parameter changes. However, these generation methods are usually based on spatial domain and are unavoidable to damage the original information of images. To tackle these limitations, we propose a novel frequency-domain simulated multispectral (FSMS) modality and visible-FSMS-infrared collaborative learning. FSMS modality consists of three-channel images generated by a channel-level reconstruction of visible images, primarily based on the nonsubsampled contourlet transform (NSCT) cooperating with a lightweight network. The generation exploits crucial spectral information and edge information contained in frequency domain. Then, we design a multi-modality network to conduct the tri-modality collaborative learning where FSMS modality is utilized as an intermediate, thereby preserving the original spatial structure of images. Additionally, a dynamic-weight tri-modality heterogeneous retrieval (THR) loss and a modality-shared classification (MSI) loss are devised to mine discriminative modality-invariant features. A cross-modality invariant (CMI) constraint for further exploring triplet-wise relationships and an intra-modality regularizer for relatively stable convergence are introduced. Finally, experimental results show that our algorithm significantly outperforms the latest state-of-the-arts by 5.7% and 4.4% CMC-1 accuracy on two mainstream benchmark datasets, respectively. And the reasons underlying the observed increase in performance are deeply discussed.

Cross-modality person re-identification via multi-task learning

Joining Features by Global Guidance with Bi-Relevance Trihard Loss for Person Re-Identification

Co-segmentation assisted cross-modality person re-identification

Cross-Modality Person Re-identification with Memory-Based Contrastive Embedding

Progressive Discriminative Feature Learning for Visible-Infrared Person Re-Identification

Feature separation and double causal comparison loss for visible and infrared person re-identification

Enhancing Visible-Infrared Person Re-identification with Modality- and Instance-aware Visual Prompt Learning

On exploring pose estimation as an auxiliary learning task for Visible–Infrared Person Re-identification

Cross-Modality Transformer With Modality Mining for Visible-Infrared Person Re-Identification

Bridging the Gap: Multi-level Cross-modality Joint Alignment for Visible-infrared Person Re-identification

Learning Progressive Modality-shared Transformers for Effective Visible-Infrared Person Re-identification

Towards Homogeneous Modality Learning and Multi-Granularity Information Exploration for Visible-Infrared Person Re-Identification

Dual adaptive alignment and partitioning network for visible and infrared cross-modality person re-identification

Joint Color-irrelevant Consistency Learning and Identity-aware Modality Adaptation for Visible-infrared Cross Modality Person Re-identification.

Deep Multi-Patch Matching Network for Visible Thermal Person Re-Identification

Visible-Infrared Person Re-Identification Based on Frequency-Domain Simulated Multispectral Modality for Dual-Mode Cameras

Discover Cross-Modality Nuances for Visible-Infrared Person Re-Identification

Multi-stage Feature Interaction Network for Masked Visible-thermal Person Re-identification

Cross modality person re-identification via mask-guided dynamic dual-task collaborative learning

Cross-modal Local Shortest Path and Global Enhancement for Visible-Thermal Person Re-Identification