Abstract:Text–image person re-identification (TIReID) seeks to leverage textual descriptions for the retrieval of target pedestrians. Due to its versatility, TIReID has gained increasing attention. However, manual annotation of textual descriptions and identity labels can be time-consuming and costly, limiting its scalability in practical settings. Privacy concerns and poor data storage can lead to data loss or ineffectiveness, further exacerbating challenges in real-world scenarios. To address these limitations, we propose for the first time incomplete Text–image person re-identification (iTIReID), which comprises a small amount of complete pairwise data and a large amount of incomplete data, where all identity labels are unavailable. We introduce a novel Contrastive Completing Learning (CCL) framework for iTIReID, consisting of two stages: Pure Contrastive Learning (PCL) and Feature Completion Contrastive Learning (FCCL). In PCL, only complete pairwise data is utilized for training, which serves as a preliminary improvement of the model's capacity and prepares for the upcoming feature completion stage. In FCCL, available features are used to complete missing modality features and facilitate effective training with incomplete data. During this process, Cross-modal Semantic Measure (CSM) is proposed to leverage intra-modality similarity to measure cross-modal similarity and filter out features with the highest semantic similarity, thereby circumventing modality discrepancy. Semantic-Weighted Generation (SWG) is proposed to generate approximate features based on the semantic similarity weight of the similar features. To fully leverage pairwise data for label-free training, we introduce the contrastive CMPM (CCMPM) loss for contrastive learning to achieve weakly supervised training. Experimental results verify the effectiveness of our proposed methods and demonstrate competitive performance compared to fully supervised methods using complete data.

Cross-modality neighbor constraints based unbalanced multi-view text–image re-identification

Joining Features by Global Guidance with Bi-Relevance Trihard Loss for Person Re-Identification

Noisy-Correspondence Learning for Text-to-Image Person Re-identification

Unifying Multi-Modal Uncertainty Modeling and Semantic Alignment for Text-to-Image Person Re-identification

Person Re-identification Based on Transform Algorithm

Learning Comprehensive Representations with Richer Self for Text-to-Image Person Re-Identification

CLIP-Driven Fine-grained Text-Image Person Re-identification

Retrieve Anyone: A General-purpose Person Re-identification Task with Instructions

Contrastive completing learning for practical text–image person ReID: Robuster and cheaper

RGB-IR Person Re-identification by Cross-Modality Similarity Preservation

Temporal Complementarity-Guided Reinforcement Learning for Image-to-Video Person Re-Identification

Prompt Decoupling for Text-to-Image Person Re-identification

Unsupervised Visible-Infrared Person ReID by Collaborative Learning with Neighbor-Guided Label Refinement

Text-augmented Multi-Modality contrastive learning for unsupervised visible-infrared person re-identification

Unleashing Potential of Unsupervised Pre-Training with Intra-Identity Regularization for Person Re-Identification

Bottom-up color-independent alignment learning for text–image person re-identification

Adaptive multi-task learning for cross domain and modal person re-identification

Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions

CPCL: Cross-Modal Prototypical Contrastive Learning for Weakly Supervised Text-based Person Re-Identification

Tensor Multi-task Learning for Person Re-identification

Prototype-guided Cross-modal Completion and Alignment for Incomplete Text-based Person Re-identification