Spectral Enhancement and Pseudo-Anchor Guidance for Infrared-Visible Person Re-Identification

Yiyuan Ge,Zhihao Chen,Ziyang Wang,Jiaju Kang,Mingya Zhang
2024-12-26
Abstract:The development of deep learning has facilitated the application of person re-identification (ReID) technology in intelligent security. Visible-infrared person re-identification (VI-ReID) aims to match pedestrians across infrared and visible modality images enabling 24-hour surveillance. Current studies relying on unsupervised modality transformations as well as inefficient embedding constraints to bridge the spectral differences between infrared and visible images, however, limit their potential performance. To tackle the limitations of the above approaches, this paper introduces a simple yet effective Spectral Enhancement and Pseudo-anchor Guidance Network, named SEPG-Net. Specifically, we propose a more homogeneous spectral enhancement scheme based on frequency domain information and greyscale space, which avoids the information loss typically caused by inefficient modality transformations. Further, a Pseudo Anchor-guided Bidirectional Aggregation (PABA) loss is introduced to bridge local modality discrepancies while better preserving discriminative identity embeddings. Experimental results on two public benchmark datasets demonstrate the superior performance of SEPG-Net against other state-of-the-art methods. The code is available at <a class="link-external link-https" href="https://github.com/1024AILab/ReID-SEPG" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Machine Learning,Image and Video Processing
What problem does this paper attempt to address?
This paper attempts to address the challenges in visible - infrared cross - modal person re - identification (Visible - Infrared Person Re - Identification, VI - ReID), specifically including the following problems: 1. **Large spectral differences between modalities**: There are significant spectral differences between visible - light images and infrared images, which makes it difficult to match images of the same pedestrian in different modalities. 2. **Limitations of existing methods**: - Methods based on generative adversarial networks (GAN) are unstable during unsupervised modal conversion and are prone to losing crucial modal information. - Other mainstream methods, although they reduce modal differences by learning associated representations, have deficiencies in extracting modal consistency and maintaining identity distinguishability. For example, some methods only focus on the aggregation of modal centers while ignoring the preservation of intra - class features. To solve these problems, the paper proposes a new method named SEPG - Net (Spectral Enhancement and Pseudo - anchor Guidance Network). The main contributions of SEPG - Net are as follows: 1. **Spectral enhancement strategy**: For the first time, it simultaneously utilizes frequency - domain and grayscale - space information to generate semantically enhanced grayscale images (SEG), thereby effectively reducing the spectral differences across modalities. 2. **Pseudo - anchor - guided bidirectional aggregation loss (PABA Loss)**: A new cross - modal constraint mechanism is introduced, which can explore consistent representations at a fine - grained level while retaining the inherent identity - distinguishing information. 3. **Experimental verification**: Experimental results on two public datasets, SYSU - MM01 and RegDB, show that SEPG - Net outperforms other state - of - the - art methods. ### Working principle of SEPG - Net #### 1. Generation of semantically enhanced grayscale images To reduce the spectral differences between visible - light and infrared images, SEPG - Net first converts RGB images into grayscale images and further uses Fourier transform to extract frequency - domain information to enhance contour representation. The specific steps are as follows: - Given a visible - light image \(Y_{vis}\) containing three channels R, G, and B, it is converted into a grayscale image \(Y_{grey}\) through a transformation function \(\tau(\cdot)\), and then copied into three channels \(Y_{grey3}\). \[ Y_{vis}(R, G, B) \xrightarrow{\tau(\cdot)} Y_{grey} \xrightarrow{\text{copy}} Y_{grey3} \] \[ \tau(x) = \alpha\cdot R(x)+\beta\cdot G(x)+\gamma\cdot B(x) \] where the values of \((\alpha, \beta, \gamma)\) are (0.299, 0.587, 0.114) respectively. - Extract the phase information of the visible - light image to enhance the contour representation: \[ X_{pha}, X_{amp}=FFFT(Y_{vis}) \] \[ Y_{seg}=FIFFT(X_{pha}) + Y_{grey3} \] #### 2. Weight - sharing two - stream network To further reduce the modal differences between SEG and IR images, SEPG - Net adopts a weight - sharing two - stream network to extract shared modal representations while retaining modality - specific cues. #### 3. Pseudo - anchor - guided bidirectional aggregation loss (PABA Loss) The PABA loss reduces the intra - class cross - modal differences by setting pseudo - anchors to attract samples from the opposite modality. The specific formula is as follows: \[ L_{PABA}^{(Ma,Mb)}(i)=\frac{1}{K}\sum_{p = 1}^{K}\max\left(\left[\max D(pf_{Ma}^{i,p},f_{Mb}^{i,p})-\right.\right.