Multi-Prompts Learning with Cross-Modal Alignment for Attribute-based Person Re-Identification

Yajing Zhai,Yawen Zeng,Zhiyong Huang,Zheng Qin,Xin Jin,Da Cao
DOI: https://doi.org/10.1609/aaai.v38i7.28524
2024-01-01
Abstract:The fine-grained attribute descriptions can significantly supplement thevaluable semantic information for person image, which is vital to the successof person re-identification (ReID) task. However, current ReID algorithmstypically failed to effectively leverage the rich contextual informationavailable, primarily due to their reliance on simplistic and coarse utilizationof image attributes. Recent advances in artificial intelligence generatedcontent have made it possible to automatically generate plentiful fine-grainedattribute descriptions and make full use of them. Thereby, this paper exploresthe potential of using the generated multiple person attributes as prompts inReID tasks with off-the-shelf (large) models for more accurate retrievalresults. To this end, we present a new framework called Multi-Prompts ReID(MP-ReID), based on prompt learning and language models, to fully dip fineattributes to assist ReID task. Specifically, MP-ReID first learns tohallucinate diverse, informative, and promptable sentences for describing thequery images. This procedure includes (i) explicit prompts of which attributesa person has and furthermore (ii) implicit learnable prompts foradjusting/conditioning the criteria used towards this person identity matching.Explicit prompts are obtained by ensembling generation models, such as ChatGPTand VQA models. Moreover, an alignment module is designed to fuse multi-prompts(i.e., explicit and implicit ones) progressively and mitigate the cross-modalgap. Extensive experiments on the existing attribute-involved ReID datasets,namely, Market1501 and DukeMTMC-reID, demonstrate the effectiveness andrationality of the proposed MP-ReID solution.
What problem does this paper attempt to address?