Abstract:With the prevalence of dual-mode cameras in surveillance systems, visible-infrared person re-identification (VI-ReID) has become an emerging topic. Existing studies of VI-ReID roughly fall into three categories: straightforwardly extracting features, improving loss functions, and conducting visible-infrared modality generation. The generation methods avoid the shortcoming of the former two that training models are generally vulnerable to parameter changes. However, these generation methods are usually based on spatial domain and are unavoidable to damage the original information of images. To tackle these limitations, we propose a novel frequency-domain simulated multispectral (FSMS) modality and visible-FSMS-infrared collaborative learning. FSMS modality consists of three-channel images generated by a channel-level reconstruction of visible images, primarily based on the nonsubsampled contourlet transform (NSCT) cooperating with a lightweight network. The generation exploits crucial spectral information and edge information contained in frequency domain. Then, we design a multi-modality network to conduct the tri-modality collaborative learning where FSMS modality is utilized as an intermediate, thereby preserving the original spatial structure of images. Additionally, a dynamic-weight tri-modality heterogeneous retrieval (THR) loss and a modality-shared classification (MSI) loss are devised to mine discriminative modality-invariant features. A cross-modality invariant (CMI) constraint for further exploring triplet-wise relationships and an intra-modality regularizer for relatively stable convergence are introduced. Finally, experimental results show that our algorithm significantly outperforms the latest state-of-the-arts by 5.7% and 4.4% CMC-1 accuracy on two mainstream benchmark datasets, respectively. And the reasons underlying the observed increase in performance are deeply discussed.

A Spatial-Channel Multi-Attention Parallel Network for Visible-Infrared Person Re-identification

Deep Siamese Network with Multi-level Similarity Perception for Person Re-identification

Dual adaptive alignment and partitioning network for visible and infrared cross-modality person re-identification

SR-VIReID: Super Resolution Assisted Visible-Infrared Person Re-Identification

An Effective Visible-Infrared Person Re-identification Network Based on Second-Order Attention and Mixed Intermediate Modality.

AMC-Net: Attentive Modality-Consistent Network for Visible-Infrared Person Re-Identification.

Visible Infrared Cross-Modality Person Re-Identification Network Based on Adaptive Pedestrian Alignment

A part-based attention network for person re-identification

Dynamic Identity-Guided Attention Network for Visible-Infrared Person Re-identification

Feature separation and double causal comparison loss for visible and infrared person re-identification

Deep Network with Spatial and Channel Attention for Person Re-identification

Co-Attentive Lifting for Infrared-Visible Person Re-Identification

Information complementary attention-based multidimension feature learning for person re-identification

Multi-Stage Auxiliary Learning for Visible-Infrared Person Re-identification

Multi-layer Attention for Person Re-Identification

Visible-infrared Cross-Modality Person Re-Identification Based on Whole-Individual Training

Counterfactual Attention Alignment for Visible-Infrared Cross-Modality Person Re-Identification

Learning Modality-Specific Representations for Visible-Infrared Person Re-Identification

Channel semantic mutual learning for visible-thermal person re-identification

Visible-Infrared Person Re-Identification Via Partially Interactive Collaboration

Visible-Infrared Person Re-Identification Based on Frequency-Domain Simulated Multispectral Modality for Dual-Mode Cameras