Cross-modality person re-identification via modality-synergy alignment learning

Yuju Lin,Banghai Wang
DOI: https://doi.org/10.1007/s00138-024-01612-5
IF: 2.983
2024-09-28
Machine Vision and Applications
Abstract:Visible-infrared person re-identification aims to match the identity of the same person from different modalities. The main challenge is the modality difference between visible and infrared images. Most existing methods mainly use generative adversarial networks to generate compensatory images of the corresponding modality to reduce the modality difference, or design diverse two-stream networks to learn global feature representations and extract globally shared features. However, due to the substantial difference between visible and infrared modalities, the created pseudo-modalities often struggle to effectively bridge the gap between modalities and tend to introduce noise. The extracted modality-shared features typically exhibit weak discriminative capability, inevitably leading to the loss of critical discriminative features related to person identity and a lack of robustness to noisy images. To tackle these challenges, we introduce a modality synergy alignment learning network. This network incorporates a novel data augmentation technique known as SliceMix, which mixes random sections of cross-modality images to synthesize a new sample that exhibits both discriminative to identity and robust to noise, thereby facilitating the learning of modality-invariant feature representations. By adjusting the mixing ratio, mixed modalities can be generated flexibly to minimize the impact of modality imbalance. Additionally, a modality alignment module is introduced to ensure similarity within the modality class and accentuate the differences between modalities. Moreover, we propose a data augmentation method called random channel grayscale, which enhances the network's robustness to color changes and expands data diversity. Comprehensive experiments on mainstream datasets, including SYSU-MM01 and RegDB, demonstrated that our method significantly improves the performance of cross-modality retrieval.
computer science, cybernetics, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?