VLUReID: Exploiting Vision-Language Knowledge for Unsupervised Person Re-Identification

Dongmei Zhang,Ray Zhang,Fan Yang,Yuan Li,Huizhu Jia,Xiaodong Xie,Shanghang Zhang
DOI: https://doi.org/10.1109/icme57554.2024.10688349
2024-01-01
Abstract:The superior performances of pre-trained vision-language models on various downstream tasks demonstrate the effectiveness of integrating cross-modal vision-language knowledge into visual tasks. However, this knowledge is hardly used for visual-based person re-identification (re-ID) because the datasets lack textual descriptions. Existing efforts require manual annotations for training, which can be time-consuming. We propose VLUReID, a framework that improves visual-based person re-ID using vision-language knowledge without requiring manual annotations from datasets. Specifically, the Vision-to-Text Association (VTA) module uses designed textual prompts to prompt the vision-language model in generating pseudo-semantic labels for visual inputs. Subsequently, within the Dual-Branch Asymmetric Training (DBAT) module, we propose an asymmetric training strategy to extract cross-modal knowledge from pseudo-semantic labels and integrate it into the person re-ID model. The experimental results on two widely-used benchmarks for unsupervised video-based person re-ID demonstrate the effectiveness of our framework.
What problem does this paper attempt to address?