Abstract:The development of deep learning has facilitated the application of person re-identification (ReID) technology in intelligent security. Visible-infrared person re-identification (VI-ReID) aims to match pedestrians across infrared and visible modality images enabling 24-hour surveillance. Current studies relying on unsupervised modality transformations as well as inefficient embedding constraints to bridge the spectral differences between infrared and visible images, however, limit their potential performance. To tackle the limitations of the above approaches, this paper introduces a simple yet effective Spectral Enhancement and Pseudo-anchor Guidance Network, named SEPG-Net. Specifically, we propose a more homogeneous spectral enhancement scheme based on frequency domain information and greyscale space, which avoids the information loss typically caused by inefficient modality transformations. Further, a Pseudo Anchor-guided Bidirectional Aggregation (PABA) loss is introduced to bridge local modality discrepancies while better preserving discriminative identity embeddings. Experimental results on two public benchmark datasets demonstrate the superior performance of SEPG-Net against other state-of-the-art methods. The code is available at <a class="link-external link-https" href="https://github.com/1024AILab/ReID-SEPG" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

This paper attempts to address the challenges in visible - infrared cross - modal person re - identification (Visible - Infrared Person Re - Identification, VI - ReID), specifically including the following problems: 1. **Large spectral differences between modalities**: There are significant spectral differences between visible - light images and infrared images, which makes it difficult to match images of the same pedestrian in different modalities. 2. **Limitations of existing methods**: - Methods based on generative adversarial networks (GAN) are unstable during unsupervised modal conversion and are prone to losing crucial modal information. - Other mainstream methods, although they reduce modal differences by learning associated representations, have deficiencies in extracting modal consistency and maintaining identity distinguishability. For example, some methods only focus on the aggregation of modal centers while ignoring the preservation of intra - class features. To solve these problems, the paper proposes a new method named SEPG - Net (Spectral Enhancement and Pseudo - anchor Guidance Network). The main contributions of SEPG - Net are as follows: 1. **Spectral enhancement strategy**: For the first time, it simultaneously utilizes frequency - domain and grayscale - space information to generate semantically enhanced grayscale images (SEG), thereby effectively reducing the spectral differences across modalities. 2. **Pseudo - anchor - guided bidirectional aggregation loss (PABA Loss)**: A new cross - modal constraint mechanism is introduced, which can explore consistent representations at a fine - grained level while retaining the inherent identity - distinguishing information. 3. **Experimental verification**: Experimental results on two public datasets, SYSU - MM01 and RegDB, show that SEPG - Net outperforms other state - of - the - art methods. ### Working principle of SEPG - Net #### 1. Generation of semantically enhanced grayscale images To reduce the spectral differences between visible - light and infrared images, SEPG - Net first converts RGB images into grayscale images and further uses Fourier transform to extract frequency - domain information to enhance contour representation. The specific steps are as follows: - Given a visible - light image \(Y_{vis}\) containing three channels R, G, and B, it is converted into a grayscale image \(Y_{grey}\) through a transformation function \(\tau(\cdot)\), and then copied into three channels \(Y_{grey3}\). \[ Y_{vis}(R, G, B) \xrightarrow{\tau(\cdot)} Y_{grey} \xrightarrow{\text{copy}} Y_{grey3} \] \[ \tau(x) = \alpha\cdot R(x)+\beta\cdot G(x)+\gamma\cdot B(x) \] where the values of \((\alpha, \beta, \gamma)\) are (0.299, 0.587, 0.114) respectively. - Extract the phase information of the visible - light image to enhance the contour representation: \[ X_{pha}, X_{amp}=FFFT(Y_{vis}) \] \[ Y_{seg}=FIFFT(X_{pha}) + Y_{grey3} \] #### 2. Weight - sharing two - stream network To further reduce the modal differences between SEG and IR images, SEPG - Net adopts a weight - sharing two - stream network to extract shared modal representations while retaining modality - specific cues. #### 3. Pseudo - anchor - guided bidirectional aggregation loss (PABA Loss) The PABA loss reduces the intra - class cross - modal differences by setting pseudo - anchors to attract samples from the opposite modality. The specific formula is as follows: \[ L_{PABA}^{(Ma,Mb)}(i)=\frac{1}{K}\sum_{p = 1}^{K}\max\left(\left[\max D(pf_{Ma}^{i,p},f_{Mb}^{i,p})-\right.\right.

Spectral Enhancement and Pseudo-Anchor Guidance for Infrared-Visible Person Re-Identification

Modality-transfer Generative Adversarial Network and Dual-Level Unified Latent Representation for Visible Thermal Person Re-Identification

Dynamic Identity-Guided Attention Network for Visible-Infrared Person Re-identification

Dual adaptive alignment and partitioning network for visible and infrared cross-modality person re-identification

Co-segmentation assisted cross-modality person re-identification

Pose Attention-Guided Paired-Images Generation for Visible-Infrared Person Re-Identification

Modality Bias Calibration Network Via Information Disentanglement for Visible–Infrared Person Reidentification

Hi-CMD: Hierarchical Cross-Modality Disentanglement for Visible-Infrared Person Re-Identification

Cooperative Separation of Modality Shared-Specific Features for Visible-Infrared Person Re-Identification

Graph Sampling-Based Multi-Stream Enhancement Network for Visible-Infrared Person Re-Identification

Learning Progressive Modality-shared Transformers for Effective Visible-Infrared Person Re-identification

Diffusion Augmentation and Pose Generation Based Pre-Training Method for Robust Visible-Infrared Person Re-Identification

High-Order Structure Based Middle-Feature Learning for Visible-Infrared Person Re-identification

An Efficient Framework for Visible-Infrared Cross Modality Person Re-Identification

Adaptive Middle Modality Alignment Learning for Visible-Infrared Person Re-identification

Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person Re-Identification

Enhancing Visible-Infrared Person Re-identification with Modality- and Instance-aware Visual Prompt Learning

Dynamic Weighted Gradient Reversal Network for Visible-infrared Person Re-identification

SFANet: A Spectrum-aware Feature Augmentation Network for Visible-Infrared Person Re-Identification

Discover Cross-Modality Nuances for Visible-Infrared Person Re-Identification

A comprehensive survey of visible infrared person re-identification from an application perspective