Abstract:Text–image person re-identification (TIReID) seeks to leverage textual descriptions for the retrieval of target pedestrians. Due to its versatility, TIReID has gained increasing attention. However, manual annotation of textual descriptions and identity labels can be time-consuming and costly, limiting its scalability in practical settings. Privacy concerns and poor data storage can lead to data loss or ineffectiveness, further exacerbating challenges in real-world scenarios. To address these limitations, we propose for the first time incomplete Text–image person re-identification (iTIReID), which comprises a small amount of complete pairwise data and a large amount of incomplete data, where all identity labels are unavailable. We introduce a novel Contrastive Completing Learning (CCL) framework for iTIReID, consisting of two stages: Pure Contrastive Learning (PCL) and Feature Completion Contrastive Learning (FCCL). In PCL, only complete pairwise data is utilized for training, which serves as a preliminary improvement of the model's capacity and prepares for the upcoming feature completion stage. In FCCL, available features are used to complete missing modality features and facilitate effective training with incomplete data. During this process, Cross-modal Semantic Measure (CSM) is proposed to leverage intra-modality similarity to measure cross-modal similarity and filter out features with the highest semantic similarity, thereby circumventing modality discrepancy. Semantic-Weighted Generation (SWG) is proposed to generate approximate features based on the semantic similarity weight of the similar features. To fully leverage pairwise data for label-free training, we introduce the contrastive CMPM (CCMPM) loss for contrastive learning to achieve weakly supervised training. Experimental results verify the effectiveness of our proposed methods and demonstrate competitive performance compared to fully supervised methods using complete data.

Bottom-up color-independent alignment learning for text–image person re-identification

Joining Features by Global Guidance with Bi-Relevance Trihard Loss for Person Re-Identification

Noisy-Correspondence Learning for Text-to-Image Person Re-identification

Cross-modality neighbor constraints based unbalanced multi-view text–image re-identification

Temporal Complementarity-Guided Reinforcement Learning for Image-to-Video Person Re-Identification

Contrastive completing learning for practical text–image person ReID: Robuster and cheaper

Unifying Multi-Modal Uncertainty Modeling and Semantic Alignment for Text-to-Image Person Re-identification

Text-augmented Multi-Modality contrastive learning for unsupervised visible-infrared person re-identification

CLIP-Driven Fine-grained Text-Image Person Re-identification

Learning Comprehensive Representations with Richer Self for Text-to-Image Person Re-Identification

Attend and Align: Improving Deep Representations with Feature Alignment Layer for Person Retrieval.

Dual Attention Matching Network for Context-Aware Feature Sequence based Person Re-Identification

AlignedReID++: Dynamically matching local information for person re-identification

Foreground-guided textural-focused person re-identification

Improving Description-based Person Re-identification by Multi-granularity Image-text Alignments

Prototype-guided Cross-modal Completion and Alignment for Incomplete Text-based Person Re-identification

TIPCB: A simple but effective part-based convolutional baseline for text-based person search

Recurrent matching networks of spatial alignment learning for person re-identification

Learning Concordant Attention Via Target-aware Alignment for Visible-Infrared Person Re-identification

Prompt Decoupling for Text-to-Image Person Re-identification

Learning Discriminative Features Through An Individual'S Entire Body And The Visual Attentional Parts For Person Re-Identification