Person Re-Identification with Hierarchical Discriminative Spatial Aggregation
Mingyang Zhang,Yang Xiao,Fu Xiong,Shuai Li,Zhiguo Cao,Zhiwen Fang,Joey Tianyi Zhou
DOI: https://doi.org/10.1109/tifs.2022.3146773
IF: 7.231
2022-01-01
IEEE Transactions on Information Forensics and Security
Abstract:Practically, person re-identification (re-ID) may suffer from the critical spatial misalignment problem due to inaccurate human detection, variation on human pose and camera viewpoint, etc. To address this, a hierarchical discriminative spatial aggregation method is proposed. The key idea is to conduct spatial aggregation on local human parts via global average-pooling to acquire the strong spatial misalignment tolerance, with VALD encoding on the local parts for facilitating discriminative power jointly. This proposition is built on NetVLAD to ensure end-to-end deep learning capacity. Due to the fine-grained property of person re-ID task that has not been well concerned by the original NetVLAD model for scene recognition, a feature refinement layer that consists of 1 fully-connected (FC) layer and 2 batch normalization (BN) layers is added on top of the raw NetVLAD layer to enhance the discriminative power and training convergence. And, a human body occlusion and background component dropout manner is also proposed to resist the effect of serious occlusion. Technically, a refined codeword initialization manner is proposed to alleviate the potential codeword imbalance problem caused by naive random initialization. The proposed discriminative spatial aggregation approach is then conducted on multi-resolution convolutional feature map layers hierarchically via early feature fusion, to involve richer semantic and fine-grained visual clues jointly. Wide-range experiments on 6 datasets (i.e., CUHK03, DukeMTMC-reID, Occluded-DukeMTMC, Market-1501, MSMT17 and Occluded-REID) verifies the effectiveness of our proposition. The source code and supporting material is available at https://github.com/zmyme/HDSA-reID.