Discriminative Spatial Feature Learning for Person Re-Identification

Peixi Peng,Yonghong Tian,Yangru Huang,Xiangqian Wang,Huilong An
DOI: https://doi.org/10.1145/3394171.3413730
2020-01-01
Abstract:Person re-identification (ReID) aims to match detected pedestrian images from multiple non-overlapping cameras. Most existing methods employ a backbone CNN to extract a vectorized feature representation by performing some global pooling operations (such as global average pooling and global max pooling) on the 3D feature map (i.e., the output of the backbone CNN). Although simple and effective in some situations, the global pooling operation only focuses on the statistical properties and ignores the spatial distribution of the feature map. Hence, it can not distinguish two feature maps when they have similar response values located in totally different positions. To handle this challenge, a novel method is proposed to learn the discriminative spatial features. Firstly, a self-constrained spatial transformer network (SC-STN) is introduced to handle the misalignments caused by detection errors. Then, based on the prior knowledge that the spatial structure of a pedestrian often keeps robust in vertical orientation of images, a novel vertical convolution network (VCN) is proposed to extract the spatial feature in vertical. Extensive experimental evaluations on several benchmarks demonstrate that the proposed method achieves state-of-the-art performances by introducing only a few parameters to the backbone.
What problem does this paper attempt to address?