PA-Net: Learning local features using by pose attention for short-term person re-identification
Kai Wang,Shichao Dong,Nian Liu,Junhui Yang,Tao Li,Qinghua Hu
DOI: https://doi.org/10.1016/j.ins.2021.02.066
IF: 8.1
2021-07-01
Information Sciences
Abstract:<p>Person re-identification (Re-ID) is an important but challenging task in for video surveillance applications. In Re-ID tasks, pose is an extremely useful cue to identify a person, even from the back view. Therefore, pose-detection models may learn the features that are beneficial to the Re-ID task and improve the Re-ID performance by fusing the feature maps into the Re-ID model. Two key problems in integrating the pose cues are addressed in this study. One is how to reduce the noise caused by cross-domain datasets. The other is how to fuse the feature maps to better utilize high-level semantic pose cues. To address these two key problems, we first propose PA-Net by combining the pose attention stream and the global attention stream, where the global attention stream distinguishes persons with different global appearances, and the pose attention stream distinguishes persons with similar global appearance but different poses. Then, we present a pose attention stream that learns local features to reduce the noise in the pose cues caused by the cross-domain datasets and provide more semantic information for the Re-ID task. The effects of the proposed pose attention are demonstrated in an ablation study, and comparative experiments show that PA-Net achieves state-of-the-art performance. Since human body is flexible, it may look very different for the same person with different poses, leading to a large intra-class variance. To align the human body with different poses, it is intuitive to fuse pose detection results into the Re-ID model. Previous works have used the pose cues in different ways, all of which indicate improved performances. However, it is sensitive to the pose detection results by directly using pose cues for the body alignment. Moreover, pose is a very useful cue for us to identify a person, and we can identify a person only with the back view. Therefore, the pose detection model may also learn the features which are beneficial to the Re-ID task, and it may improve the Re-ID performance by fusing the feature maps into the Re-ID model. A two-stream model named PA-Net is proposed in this paper, which learns the local features through pose attention generated by the pose detection model. In addition to aligning the flexible human body and providing more cues for Re-ID, the local features also have a positive impact on the global features. During the inference, although only global features are used for person Re-ID, the model performance can be improved on the baseline. That is, the proposed pose attention works does not require any additional inference time cost, which makes the proposed PA-Net be more suitable for real-time applications. In this research, the effects of the proposed pose attention are demonstrated by ablation study, and comparative experiments show that PA-Net achieves state-of-the-art performance.</p>
computer science, information systems