VI-Diff: Unpaired Visible-Infrared Translation Diffusion Model for Single Modality Labeled Visible-Infrared Person Re-identification

Han Huang,Yan Huang,Liang Wang
2023-10-06
Abstract:Visible-Infrared person re-identification (VI-ReID) in real-world scenarios poses a significant challenge due to the high cost of cross-modality data annotation. Different sensing cameras, such as RGB/IR cameras for good/poor lighting conditions, make it costly and error-prone to identify the same person across modalities. To overcome this, we explore the use of single-modality labeled data for the VI-ReID task, which is more cost-effective and practical. By labeling pedestrians in only one modality (e.g., visible images) and retrieving in another modality (e.g., infrared images), we aim to create a training set containing both originally labeled and modality-translated data using unpaired image-to-image translation techniques. In this paper, we propose VI-Diff, a diffusion model that effectively addresses the task of Visible-Infrared person image translation. Through comprehensive experiments, we demonstrate that VI-Diff outperforms existing diffusion and GAN models, making it a promising solution for VI-ReID with single-modality labeled data. Our approach can be a promising solution to the VI-ReID task with single-modality labeled data and serves as a good starting point for future study. Code will be available.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper primarily focuses on the data annotation challenges in the task of visible-infrared cross-modal person re-identification (VI-ReID). Specifically: 1. **Using Single-Modal Annotated Data for VI-ReID**: - In real-world scenarios, the cost of annotating cross-modal data is high and prone to errors, especially when annotating images of the same person under RGB and infrared cameras. The paper proposes a method to accomplish the VI-ReID task using single-modal annotated data, i.e., annotating only one modality (such as visible light images) and retrieving in the other modality (such as infrared images). 2. **Proposing the VI-Diff Model**: - To achieve this goal, the authors propose the VI-Diff model, a diffusion model-based approach that can effectively translate visible light images to infrared images without paired images. By generating high-quality cross-modal images, this model can effectively alleviate the difficulties of cross-modal data annotation and perform well in the VI-ReID task. 3. **Validating Effectiveness**: - The paper validates the effectiveness and potential value of the VI-Diff model through extensive experiments, demonstrating its superior performance with single-modal annotated data and proving it to be better than existing diffusion models and GAN models. In summary, the paper aims to address the data annotation challenges in cross-modal person re-identification by utilizing single-modal annotated data and proposes an effective solution—the VI-Diff model.