Shallow-Deep Collaborative Learning for Unsupervised Visible-Infrared Person Re-Identification

Jun Chen,Binghao Yang,Mang Ye
DOI: https://doi.org/10.1109/CVPR52733.2024.01596
2024-06-16
Computer Vision and Pattern Recognition
Abstract:Unsupervised visible-infrared person re-identification (US-VI-ReID) centers on learning a cross-modality retrieval model without labels, reducing the reliance on expensive cross-modality manual annotation. Previous US-VI-ReID works gravitate toward learning cross-modality information with the deep features extracted from the ultimate layer. Nevertheless, interfered by the multiple discrepancies, solely relying on deep features is insufficient for accurately learning modality-invariant features, resulting in negative optimization. The shallow feature from the shallow layers contains nuanced detail information, which is critical for effective cross-modality learning but is dis- regarded regrettably by the existing methods. To address the above issues, we design a Shallow-Deep Collaborative Learning (SDCL) framework based on the transformer with shallow-deep contrastive learning, incorporating Collaborative Neighbor Learning (CNL) and Collaborative Ranking Association (CRA) module. Specifically, CNL unveils the intrinsic homogeneous and heterogeneous collaboration which are harnessed for neighbor alignment, enhancing the robustness in a dynamic manner. Furthermore, CRA associates the cross-modality labels with the ranking association between shallow and deep features, furnishing valuable supervision for cross-modality learning. Extensive experiments validate the superiority of our method, even outperforming certain supervised counterparts.
Engineering,Environmental Science,Computer Science
What problem does this paper attempt to address?