VAC-Net: Visual Attention Consistency Network for Person Re-identification

Weidong Shi,Yunzhou Zhang,Shangdong Zhu,Yixiu Liu,Sonya Coleman,Dermot Kerr
DOI: https://doi.org/10.1145/3512527.3531409
2022-01-01
Abstract:Person re-identification (ReID) is a crucial aspect of recognising pedestrians across multiple surveillance cameras. Even though significant progress has been made in recent years, the viewpoint change and scale variations still affect model performance. In this paper, we observe that it is beneficial for the model to handle the above issues when boost the consistent feature extraction capability among different transforms (e.g., flipping and scaling) of the same image. To this end, we propose a visual attention consistency network (VAC-Net). Specifically, we propose Embedding Spatial Consistency (ESC) architecture with flipping, scaling and original forms of the same image as inputs to learn a consistent embedding space. Furthermore, we design an Input-Wise visual attention consistent loss (IW-loss) so that the class activation maps(CAMs) from the three transforms are aligned with each other to enforce their advanced semantic information remains consistent. Finally, we propose a Layer-Wise visual attention consistent loss (LW-loss) to further enforce the semantic information among different stages to be consistent with the CAMs within each branch. These two losses can effectively improve the model to address the viewpoint and scale variations. Experiments on the challenging Market-1501, DukeMTMC-reID, and MSMT17 datasets demonstrate the effectiveness of the proposed VAC-Net.
What problem does this paper attempt to address?