Abstract:<p>Person re-identification (Re-ID) is an important but challenging task in for video surveillance applications. In Re-ID tasks, pose is an extremely useful cue to identify a person, even from the back view. Therefore, pose-detection models may learn the features that are beneficial to the Re-ID task and improve the Re-ID performance by fusing the feature maps into the Re-ID model. Two key problems in integrating the pose cues are addressed in this study. One is how to reduce the noise caused by cross-domain datasets. The other is how to fuse the feature maps to better utilize high-level semantic pose cues. To address these two key problems, we first propose PA-Net by combining the pose attention stream and the global attention stream, where the global attention stream distinguishes persons with different global appearances, and the pose attention stream distinguishes persons with similar global appearance but different poses. Then, we present a pose attention stream that learns local features to reduce the noise in the pose cues caused by the cross-domain datasets and provide more semantic information for the Re-ID task. The effects of the proposed pose attention are demonstrated in an ablation study, and comparative experiments show that PA-Net achieves state-of-the-art performance. Since human body is flexible, it may look very different for the same person with different poses, leading to a large intra-class variance. To align the human body with different poses, it is intuitive to fuse pose detection results into the Re-ID model. Previous works have used the pose cues in different ways, all of which indicate improved performances. However, it is sensitive to the pose detection results by directly using pose cues for the body alignment. Moreover, pose is a very useful cue for us to identify a person, and we can identify a person only with the back view. Therefore, the pose detection model may also learn the features which are beneficial to the Re-ID task, and it may improve the Re-ID performance by fusing the feature maps into the Re-ID model. A two-stream model named PA-Net is proposed in this paper, which learns the local features through pose attention generated by the pose detection model. In addition to aligning the flexible human body and providing more cues for Re-ID, the local features also have a positive impact on the global features. During the inference, although only global features are used for person Re-ID, the model performance can be improved on the baseline. That is, the proposed pose attention works does not require any additional inference time cost, which makes the proposed PA-Net be more suitable for real-time applications. In this research, the effects of the proposed pose attention are demonstrated by ablation study, and comparative experiments show that PA-Net achieves state-of-the-art performance.</p>

Pose-Guided Attention and Alignment (PGAA) for Dislocated Target Person Re-Identification Based on Multi-feature Fusion

A multi-branch attention and alignment network for person re-identification

Pose-Guided Feature Alignment for Occluded Person Re-Identification

Pose Matters:Pose Guided Graph Attention Network for Person Re-Identification

Pose-Guided Feature Learning with Knowledge Distillation for Occluded Person Re-Identification.

Multi-Branch Feature Alignment Network for Misaligned and Occluded Person Re-Identification

PAII: A Pose Alignment Network with Information Interaction for Person Re-identification.

Pose-guided spatiotemporal alignment for video-based person Re-identification.

Person Re-identification Incorporating Super-Resolution Enhanced Pose Estimation

Pose-driven Person Re-identification with Local Feature Alignment

Fine-Grained Spatial Alignment Model for Person Re-Identification with Focal Triplet Loss.

PA-Net: Learning local features using by pose attention for short-term person re-identification

Pixel Level Alignment Person Re-Identification based on Multi-Branch Part Reconstructing.

Ensemble of pose aligned features for person re-identification

Reliable Part Guided Multiple Level Attention Learning for Person Re-Identification

Occluded Person Re-Identification with Pose Estimation Correction and Feature Reconstruction

Learning Part-Alignment Feature for Person Re-Identification with Spatial-Temporal-based Re-Ranking Method

An Orientation-Aware Attention Network for Person Re-Identification

Pose-guided Neural Network with Hybrid Representation for Person Re-Identification

Person Re-identification with Pose Variation Aware Data Augmentation

Progressive Feature Alignment for Occluded Person Re-Identification