Unified Conditional Image Generation for Visible-Infrared Person Re-Identification

Honghu Pan,Wenjie Pei,Xin Li,Zhenyu He
DOI: https://doi.org/10.1109/tifs.2024.3426335
2024-01-01
Abstract:This paper proposes a unified multi-modal image generation method to address two critical challenges in visible-infrared (VI) person re-identification (ReID): the insufficiency of training samples and the large cross-modality discrepancy. To be specific, we propose to generate cross-modal and middle-modal images to explicitly reduce the modality discrepancy, and generate intra-modal images to serve as training samples for datasets augmentation. To this end, we adapt the conditional diffusion model for multi-modal image generation. The condition includes a binary modality indicator and modal-irrelative pedestrian contour to control the target modality and pedestrian identity, respectively. For the intra-modality and cross-modality image generation, we modify the structure of UNet to take as input the conditions, and estimate the conditional probability density by optimizing its variational lower bound. Furthermore, we devise modal discriminators and adversarial training strategies to achieve modality alignment. The middle-modality image generation method shares the same network architecture with intra- and cross-modality generation, but has specific training objectives. We define the middle modality as the distribution equidistant from the visible modality and infrared modality. We employ the adversarial training to measure the distance from the visible or infrared modality to the middle modality, and thus minimize the difference between these two adversarial losses, serving as an equidistant constraint. Experimental results on SYSU-MM01 and RegDB demonstrate the effectiveness and generalization of the intra-modality, cross-modality, and middle-modality image generation.
What problem does this paper attempt to address?