Abstract:Text–image person re-identification (TIReID) seeks to leverage textual descriptions for the retrieval of target pedestrians. Due to its versatility, TIReID has gained increasing attention. However, manual annotation of textual descriptions and identity labels can be time-consuming and costly, limiting its scalability in practical settings. Privacy concerns and poor data storage can lead to data loss or ineffectiveness, further exacerbating challenges in real-world scenarios. To address these limitations, we propose for the first time incomplete Text–image person re-identification (iTIReID), which comprises a small amount of complete pairwise data and a large amount of incomplete data, where all identity labels are unavailable. We introduce a novel Contrastive Completing Learning (CCL) framework for iTIReID, consisting of two stages: Pure Contrastive Learning (PCL) and Feature Completion Contrastive Learning (FCCL). In PCL, only complete pairwise data is utilized for training, which serves as a preliminary improvement of the model's capacity and prepares for the upcoming feature completion stage. In FCCL, available features are used to complete missing modality features and facilitate effective training with incomplete data. During this process, Cross-modal Semantic Measure (CSM) is proposed to leverage intra-modality similarity to measure cross-modal similarity and filter out features with the highest semantic similarity, thereby circumventing modality discrepancy. Semantic-Weighted Generation (SWG) is proposed to generate approximate features based on the semantic similarity weight of the similar features. To fully leverage pairwise data for label-free training, we introduce the contrastive CMPM (CCMPM) loss for contrastive learning to achieve weakly supervised training. Experimental results verify the effectiveness of our proposed methods and demonstrate competitive performance compared to fully supervised methods using complete data.

Parallel Data Augmentation for Text-based Person Re-identification

Hierarchical and Efficient Learning for Person Re-Identification

Occluded person re-identification based on parallel triplet augmentation and parameter-free token spatial attention

Prototype-guided Cross-modal Completion and Alignment for Incomplete Text-based Person Re-identification

Text-Based Person Search with Limited Data

Noisy-Correspondence Learning for Text-to-Image Person Re-identification

CPCL: Cross-Modal Prototypical Contrastive Learning for Weakly Supervised Text-based Person Re-Identification

Text-augmented Multi-Modality contrastive learning for unsupervised visible-infrared person re-identification

Foreground-guided textural-focused person re-identification

DualFocus: Integrating Plausible Descriptions in Text-based Person Re-identification

Cross-Modal Adaptive Dual Association for Text-to-Image Person Retrieval

PCNET: Parallelly Conquer the Large Variance of Person Re-Identification

Pose-Guided Feature Alignment for Occluded Person Re-Identification

Harnessing the Power of MLLMs for Transferable Text-to-Image Person ReID

Part Representation Learning with Teacher-Student Decoder for Occluded Person Re-identification

Improving Description-based Person Re-identification by Multi-granularity Image-text Alignments

Contrastive completing learning for practical text–image person ReID: Robuster and cheaper

Cross-modality neighbor constraints based unbalanced multi-view text–image re-identification

TextAug: Test time Text Augmentation for Multimodal Person Re-identification

ProFD: Prompt-Guided Feature Disentangling for Occluded Person Re-Identification

Dynamic Patch-aware Enrichment Transformer for Occluded Person Re-Identification