Abstract:Lifelong person re-identification attempts to recognize people across cameras and integrate new knowledge from continuous data streams. Key challenges involve addressing catastrophic forgetting caused by parameter updating and domain shift, and maintaining performance in seen and unseen domains. Many previous works rely on data memories to retain prior samples. However, the amount of retained data increases linearly with the number of training domains, leading to continually increasing memory consumption. Additionally, these methods may suffer significant performance degradation when data preservation is prohibited due to privacy concerns. To address these limitations, we propose using textual descriptions as guidance to encourage the ReID model to learn cross-domain invariant features without retaining samples. The key insight is that natural language can describe pedestrian instances with an invariant style, suggesting a shared textual space for any pedestrian images. By leveraging this shared textual space as an anchor, we can prompt the ReID model to embed images from various domains into a unified semantic space, thereby alleviating catastrophic forgetting caused by domain shifts. To achieve this, we introduce a task-driven dynamic textual prompt framework in this paper. This model features a dynamic prompt fusion module, which adaptively constructs and fuses two different textual prompts as anchors. This effectively guides the ReID model to embed images into a unified semantic space. Additionally, we design a text-visual feature alignment module to learn a more precise mapping between fine-grained visual and textual features. We also developed a learnable knowledge distillation module that allows our model to dynamically balance retaining existing knowledge with acquiring new knowledge. Extensive experiments demonstrate that our method outperforms SOTAs under various settings.

VLUReID: Exploiting Vision-Language Knowledge for Unsupervised Person Re-Identification

When Large Vision-Language Models Meet Person Re-Identification

Unsupervised Visible-Infrared Person ReID by Collaborative Learning with Neighbor-Guided Label Refinement

Exploring Part-Informed Visual-Language Learning for Person Re-Identification

Translation, Association and Augmentation: Learning Cross-Modality Re-Identification From Single-Modality Annotation

Retrieve Anyone: A General-purpose Person Re-identification Task with Instructions

VILLS: Video-Image Learning to Learn Semantics for Person Re-Identification

Leveraging Virtual and Real Person for Unsupervised Person Re-identification

Video-based Person Re-identification with Long Short-Term Representation Learning

Unsupervised Visible-Infrared ReID via Pseudo-label Correction and Modality-level Alignment

Adaptive multi-task learning for cross domain and modal person re-identification

Person Re-Identification with Joint Verification and Identification of Identity-Attribute Labels

MLLMReID: Multimodal Large Language Model-based Person Re-identification

Unifying Multi-Modal Uncertainty Modeling and Semantic Alignment for Text-to-Image Person Re-identification

Harnessing the Power of MLLMs for Transferable Text-to-Image Person ReID

Shallow-Deep Collaborative Learning for Unsupervised Visible-Infrared Person Re-Identification

CLIP-ReID: Exploiting Vision-Language Model for Image Re-identification without Concrete Text Labels

Image Re-Identification: Where Self-supervision Meets Vision-Language Learning

Dynamic Textual Prompt For Rehearsal-free Lifelong Person Re-identification

Boosting Person Re-Identification with Viewpoint Contrastive Learning and Adversarial Training

Cross-modality neighbor constraints based unbalanced multi-view text–image re-identification