SequencePAR: Understanding Pedestrian Attributes via A Sequence Generation Paradigm

Jiandong Jin,Xiao Wang,Chenglong Li,Lili Huang,Jin Tang
2023-12-04
Abstract:Current pedestrian attribute recognition (PAR) algorithms are developed based on multi-label or multi-task learning frameworks, which aim to discriminate the attributes using specific classification heads. However, these discriminative models are easily influenced by imbalanced data or noisy samples. Inspired by the success of generative models, we rethink the pedestrian attribute recognition scheme and believe the generative models may perform better on modeling dependencies and complexity between human attributes. In this paper, we propose a novel sequence generation paradigm for pedestrian attribute recognition, termed SequencePAR. It extracts the pedestrian features using a pre-trained CLIP model and embeds the attribute set into query tokens under the guidance of text prompts. Then, a Transformer decoder is proposed to generate the human attributes by incorporating the visual features and attribute query tokens. The masked multi-head attention layer is introduced into the decoder module to prevent the model from remembering the next attribute while making attribute predictions during training. Extensive experiments on multiple widely used pedestrian attribute recognition datasets fully validated the effectiveness of our proposed SequencePAR. The source code and pre-trained models will be released at <a class="link-external link-https" href="https://github.com/Event-AHU/OpenPAR" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Multimedia
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve several key problems in Pedestrian Attribute Recognition (PAR): 1. **The impact of data imbalance and noisy samples**: - Existing PAR algorithms based on multi - label or multi - task learning frameworks are vulnerable to imbalanced data or noisy samples. For example, a large number of negative samples may lead to sparse attribute prediction, and inaccurately labeled datasets may introduce noise and affect model performance. 2. **Weak semantic connection between attributes**: - Existing methods usually regress each attribute simultaneously, resulting in a weak semantic connection between attributes and being unable to fully capture the complex dependencies between attributes. 3. **Limitations of traditional discriminative models**: - Discriminative models perform poorly when dealing with complex, highly - dependent tasks. They distinguish attributes through specific classification heads but have difficulty modeling the complexity and dependencies between attributes. ### Proposed solutions To solve the above problems, the authors propose a new pedestrian attribute recognition framework based on sequence generation, called **SequencePAR**. Its main innovations include: 1. **Redefining the attribute recognition task as a sequence generation problem**: - By regarding attribute recognition as an image caption generation task, the relationships between attributes can be better modeled. Specifically, SequencePAR uses a pre - trained CLIP model to extract pedestrian image features and embeds attribute descriptions into query tokens. Then, it uses a Transformer decoder to generate an attribute sequence. 2. **Introducing the masked multi - head attention mechanism**: - During the training process, a masked multi - head attention layer is introduced to prevent the model from memorizing the next attribute, ensuring that the current attribute prediction only depends on the previous context information. This helps to improve the model's generalization ability and robustness. 3. **Fusing visual and text features**: - SequencePAR not only utilizes the visual features of pedestrian images but also combines the text representations of attributes, thereby more comprehensively capturing the semantic relationships between attributes. ### Experimental verification The authors conducted a large number of experiments on several widely - used pedestrian attribute recognition datasets to verify the effectiveness of SequencePAR. The experimental results show that SequencePAR outperforms existing state - of - the - art methods in multiple metrics, especially in dealing with imbalanced data and noisy samples. ### Summary This paper re - examines the pedestrian attribute recognition task by introducing the idea of generative models and proposes the novel framework SequencePAR. This framework can better model the complex dependencies between attributes, improves the model's robustness and generalization ability, and solves the problems of data imbalance and noisy samples in existing methods.