Prompting Continual Person Search

Pengcheng Zhang,Xiaohan Yu,Xiao Bai,Jin Zheng,Xin Ning
2024-10-25
Abstract:The development of person search techniques has been greatly promoted in recent years for its superior practicality and challenging goals. Despite their significant progress, existing person search models still lack the ability to continually learn from increaseing real-world data and adaptively process input from different domains. To this end, this work introduces the continual person search task that sequentially learns on multiple domains and then performs person search on all seen domains. This requires balancing the stability and plasticity of the model to continually learn new knowledge without catastrophic forgetting. For this, we propose a Prompt-based Continual Person Search (PoPS) model in this paper. First, we design a compositional person search transformer to construct an effective pre-trained transformer without exhaustive pre-training from scratch on large-scale person search data. This serves as the fundamental for prompt-based continual learning. On top of that, we design a domain incremental prompt pool with a diverse attribute matching module. For each domain, we independently learn a set of prompts to encode the domain-oriented knowledge. Meanwhile, we jointly learn a group of diverse attribute projections and prototype embeddings to capture discriminative domain attributes. By matching an input image with the learned attributes across domains, the learned prompts can be properly selected for model inference. Extensive experiments are conducted to validate the proposed method for continual person search. The source code is available at <a class="link-external link-https" href="https://github.com/PatrickZad/PoPS" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problem of **Continual Person Search (CPS)**. Specifically, although existing person - search models have made significant progress on specific datasets, they lack the ability to continuously learn from the ever - increasing real - world data and have difficulty adapting to input data from different domains. To solve these problems, the authors propose a model named **Prompt - based Continual Person Search (PoPS)**. #### Main problems: 1. **Lack of continuous learning ability**: Existing models are unable to continuously learn new domain data without forgetting previous knowledge. 2. **Poor domain adaptability**: Existing models have difficulty processing input data from different domains, especially when the domain differences are not obvious. 3. **High cost of large - scale pre - training**: Redesigning and pre - training a Transformer model specifically for person search requires a large amount of data and computing resources. #### Solutions: - **Propose the CPS task**: This task requires the model to be able to learn on multiple domains sequentially and perform person - search tasks on all the domains it has seen. This requires improving the flexibility of the model while maintaining its stability to avoid catastrophic forgetting. - **Construct a compositional person - search Transformer**: By expanding a pre - trained hierarchical vision Transformer (such as Swin) and adding a simple feature pyramid, it is made to have the ability to locate persons. This method reduces the cost of large - scale pre - training from scratch. - **Design a domain - incremental prompt pool**: Each domain independently learns a set of prompts to encode domain - specific knowledge. At the same time, a diverse set of attribute projections and prototype embeddings are jointly learned to capture discriminative domain attributes. By matching the input image with the learned attributes, an appropriate prompt is selected for inference. ### Summary This paper first proposes the problem of continuous person search and designs a prompt - based continuous learning framework PoPS, which can continuously adapt to new person - search tasks without forgetting previous knowledge. This method not only reduces the cost of large - scale pre - training, but also improves the adaptability and robustness of the model on different domains.