ProFD: Prompt-Guided Feature Disentangling for Occluded Person Re-Identification

Can Cui,Siteng Huang,Wenxuan Song,Pengxiang Ding,Min Zhang,Donglin Wang
2024-09-30
Abstract:To address the occlusion issues in person Re-Identification (ReID) tasks, many methods have been proposed to extract part features by introducing external spatial information. However, due to missing part appearance information caused by occlusion and noisy spatial information from external model, these purely vision-based approaches fail to correctly learn the features of human body parts from limited training data and struggle in accurately locating body parts, ultimately leading to misaligned part features. To tackle these challenges, we propose a Prompt-guided Feature Disentangling method (ProFD), which leverages the rich pre-trained knowledge in the textual modality facilitate model to generate well-aligned part features. ProFD first designs part-specific prompts and utilizes noisy segmentation mask to preliminarily align visual and textual embedding, enabling the textual prompts to have spatial awareness. Furthermore, to alleviate the noise from external masks, ProFD adopts a hybrid-attention decoder, ensuring spatial and semantic consistency during the decoding process to minimize noise impact. Additionally, to avoid catastrophic forgetting, we employ a self-distillation strategy, retaining pre-trained knowledge of CLIP to mitigate over-fitting. Evaluation results on the Market1501, DukeMTMC-ReID, Occluded-Duke, Occluded-ReID, and P-DukeMTMC datasets demonstrate that ProFD achieves state-of-the-art results. Our project is available at: <a class="link-external link-https" href="https://github.com/Cuixxx/ProFD" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Multimedia
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address two key issues in the task of person re-identification (ReID) in occluded scenes: 1. **Partial appearance information loss due to occlusion**: Occlusion can lead to the loss of visual information of certain body parts in the training data, significantly reducing the frequency of these parts appearing in the dataset. 2. **Spatial information noise generated by external models**: Due to the domain differences between the training data of external models and the ReID dataset, the pseudo-labels generated by external models inevitably contain errors, introducing noise that makes it difficult for the model to accurately locate human part features, ultimately leading to misalignment of some features. To address these issues, the authors propose a Prompt-guided Feature Disentangling (ProFD) method. By leveraging the rich pre-trained knowledge in the text modality, ProFD can generate well-aligned part features. Specifically, ProFD designs prompts specific to different body parts and uses noise segmentation masks to preliminarily align visual and text embeddings, enabling the text prompts to have spatial awareness. Additionally, to mitigate the noise impact of external masks, ProFD employs a hybrid attention decoder to ensure spatial and semantic consistency during decoding, minimizing the noise impact. To prevent catastrophic forgetting, ProFD also adopts a self-distillation strategy to retain the pre-trained knowledge of CLIP, avoiding overfitting. ### Main Contributions 1. **Proposing a new framework ProFD**: Effectively guiding the disentangling of part features in occluded person re-identification tasks through the use of text prompts. 2. **Proposing a new self-distillation strategy**: To better retain pre-trained multimodal knowledge and mitigate overfitting. 3. **Conducting extensive experiments**: Experiments were conducted on comprehensive datasets Market1501 and DukeMTMC-ReID, as well as occluded datasets Occluded-Duke, Occluded-ReID, and P-DukeMTMC. The results show that ProFD outperforms existing methods on multiple metrics, particularly on the Occluded-ReID dataset, where mAP increased by at least 8.3% and Rank-1 accuracy increased by at least 4.8%.