Chatting with Interactive Memory for Text-based Person Retrieval (chinamm 2024)

Chen He,Shenshen Li,Zheng Wang,Hua Chen,Fumin Shen,Xing Xu
DOI: https://doi.org/10.21203/rs.3.rs-4825548/v1
2024-01-01
Abstract:Text-based person retrieval aims to match a specific pedestrian image with textual descriptions. Traditional approaches have largely focused on utilizing a "single-shot" query with text description.They may not align well with real-world scenarios and cannot fully encapsulate detailed cues since users may employ multiple and partial queries to describe a pedestrian. To overcome this discrepancy, we introduce a novel model termed Chatting with Interactive Memory (CIM) for the text-based person retrieval task. Our CIM model facilitates a more nuanced and interactive search process by allowing users to engage in multiple rounds of dialogue, providing a more comprehensive description of the person of interest. The proposed CIM model is structured around two pivotal components: (1) The Interactive Retrieval Module, leveraging interactive memory to dynamically process dialogue and enhance image retrieval, and (2) The Q&A Module, crafted to simulate real user interactions. Our extensive evaluations on three widely-used datasets CUHK-PEDES, ICFG-PEDES, and RSTPReid illustrate the superior performance of the proposed CIM framework, significantly improving the precision and user engagement in text-based person retrieval tasks.
What problem does this paper attempt to address?