Abstract:To address the occlusion issues in person Re-Identification (ReID) tasks, many methods have been proposed to extract part features by introducing external spatial information. However, due to missing part appearance information caused by occlusion and noisy spatial information from external model, these purely vision-based approaches fail to correctly learn the features of human body parts from limited training data and struggle in accurately locating body parts, ultimately leading to misaligned part features. To tackle these challenges, we propose a Prompt-guided Feature Disentangling method (ProFD), which leverages the rich pre-trained knowledge in the textual modality facilitate model to generate well-aligned part features. ProFD first designs part-specific prompts and utilizes noisy segmentation mask to preliminarily align visual and textual embedding, enabling the textual prompts to have spatial awareness. Furthermore, to alleviate the noise from external masks, ProFD adopts a hybrid-attention decoder, ensuring spatial and semantic consistency during the decoding process to minimize noise impact. Additionally, to avoid catastrophic forgetting, we employ a self-distillation strategy, retaining pre-trained knowledge of CLIP to mitigate over-fitting. Evaluation results on the Market1501, DukeMTMC-ReID, Occluded-Duke, Occluded-ReID, and P-DukeMTMC datasets demonstrate that ProFD achieves state-of-the-art results. Our project is available at: <a class="link-external link-https" href="https://github.com/Cuixxx/ProFD" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address two key issues in the task of person re-identification (ReID) in occluded scenes: 1. **Partial appearance information loss due to occlusion**: Occlusion can lead to the loss of visual information of certain body parts in the training data, significantly reducing the frequency of these parts appearing in the dataset. 2. **Spatial information noise generated by external models**: Due to the domain differences between the training data of external models and the ReID dataset, the pseudo-labels generated by external models inevitably contain errors, introducing noise that makes it difficult for the model to accurately locate human part features, ultimately leading to misalignment of some features. To address these issues, the authors propose a Prompt-guided Feature Disentangling (ProFD) method. By leveraging the rich pre-trained knowledge in the text modality, ProFD can generate well-aligned part features. Specifically, ProFD designs prompts specific to different body parts and uses noise segmentation masks to preliminarily align visual and text embeddings, enabling the text prompts to have spatial awareness. Additionally, to mitigate the noise impact of external masks, ProFD employs a hybrid attention decoder to ensure spatial and semantic consistency during decoding, minimizing the noise impact. To prevent catastrophic forgetting, ProFD also adopts a self-distillation strategy to retain the pre-trained knowledge of CLIP, avoiding overfitting. ### Main Contributions 1. **Proposing a new framework ProFD**: Effectively guiding the disentangling of part features in occluded person re-identification tasks through the use of text prompts. 2. **Proposing a new self-distillation strategy**: To better retain pre-trained multimodal knowledge and mitigate overfitting. 3. **Conducting extensive experiments**: Experiments were conducted on comprehensive datasets Market1501 and DukeMTMC-ReID, as well as occluded datasets Occluded-Duke, Occluded-ReID, and P-DukeMTMC. The results show that ProFD outperforms existing methods on multiple metrics, particularly on the Occluded-ReID dataset, where mAP increased by at least 8.3% and Rank-1 accuracy increased by at least 4.8%.

ProFD: Prompt-Guided Feature Disentangling for Occluded Person Re-Identification

Joining Features by Global Guidance with Bi-Relevance Trihard Loss for Person Re-Identification

Pose-Guided Feature Learning with Knowledge Distillation for Occluded Person Re-Identification.

Pose-guided Feature Disentangling for Occluded Person Re-identification Based on Transformer

Part-based Representation Enhancement for Occluded Person Re-identification

Dynamic Feature Pruning and Consolidation for Occluded Person Re-Identification

Interesting Receptive Region and Feature Excitation for Partial Person Re-identification

Feature Erasing and Diffusion Network for Occluded Person Re-Identification

Identifying Visible Parts via Pose Estimation for Occluded Person Re-Identification

Dynamic Patch-aware Enrichment Transformer for Occluded Person Re-Identification

Pose-Guided Feature Alignment for Occluded Person Re-Identification

Feature Mixing and Disentangling for Occluded Person Re-Identification

DDRN:a Data Distribution Reconstruction Network for Occluded Person Re-Identification

A Study of Occluded Person Re-Identification for Shared Feature Fusion with Pose-Guided and Unsupervised Semantic Segmentation

Part Representation Learning with Teacher-Student Decoder for Occluded Person Re-identification

Semantically enhanced attention map‐driven occluded person re‐identification

Part-aware Network: a Simple but Efficient Method for Occluded Person Re-Identification

Part-Attention Based Model Make Occluded Person Re-Identification Stronger

Foreground-guided textural-focused person re-identification

Feature Completion for Occluded Person Re-Identification

Feature Completion Transformer for Occluded Person Re-identification