Abstract:In this article, we propose a novel deep Siamese architecture based on a convolutional neural network (CNN) and multi-level similarity perception for the person re-identification (re-ID) problem. According to the distinct characteristics of diverse feature maps, we effectively apply different similarity constraints to both low-level and high-level feature maps during training stage. Due to the introduction of appropriate similarity comparison mechanisms at different levels, the proposed approach can adaptively learn discriminative local and global feature representations, respectively, while the former is more sensitive in localizing part-level prominent patterns relevant to re-identifying people across cameras. Meanwhile, a novel strong activation pooling strategy is utilized on the last convolutional layer for abstract local-feature aggregation to pursue more representative feature representations. Based on this, we propose final feature embedding by simultaneously encoding original global features and discriminative local features. In addition, our framework has two other benefits: First, classification constraints can be easily incorporated into the framework, forming a unified multi-task network with similarity constraints. Second, as similarity-comparable information has been encoded in the network’s learning parameters via back-propagation, pairwise input is not necessary at test time. That means we can extract features of each gallery image and build an index in an off-line manner, which is essential for large-scale real-world applications. Experimental results on multiple challenging benchmarks demonstrate that our method achieves splendid performance compared with the current state-of-the-art approaches.

Ssn3d: Self-Separated Network To Align Parts For 3d Convolution In Video Person Re-Identification

Deep Siamese Network with Multi-level Similarity Perception for Person Re-identification

Contribution-Based Multi-Stream Feature Distance Fusion Method with ${k}$ -Distribution Re-Ranking for Person Re-Identification

Joint Uneven Channel Information Network with Blend Metric Loss for Person Re-Identification

Image-to-video person re-identification using three-dimensional semantic appearance alignment and cross-modal interactive learning

Spatial-Temporal Correlation and Topology Learning for Person Re-Identification in Videos

MSTN: A Multi-granular Spatial–Temporal Network for video-based person re-identification

Dense 3D-Convolutional Neural Network for Person Re-Identification in Videos

Dual Attention Matching Network for Context-Aware Feature Sequence based Person Re-Identification

Multi-Scale 3D Convolution Network for Video Based Person Re-Identification.

Person Re-Identification by Unsupervised Video Matching.

Densely Semantically Aligned Person Re-Identification

AA-RGTCN: Reciprocal Global Temporal Convolution Network with Adaptive Alignment for Video-Based Person Re-Identification

Spindle Net: Person Re-identification with Human Body Region Guided Feature Decomposition and Fusion

Video-Based Person Re-Identification Using Spatial-Temporal Memory Coupling Network

Deep Recurrent Convolutional Networks for Video-based Person Re-identification: An End-to-End Approach

Triplet Attention Network for Video-Based Person Re-Identification

Video-based Person Re-identification with Two-stream Convolutional Network and Co-attentive Snippet Embedding

Multi-level Similarity Perception Network for Person Re-identification

Parallel Attention with Weighted Efficient Network for Video-Based Person Re-Identification.

Multi-Scale Temporal Cues Learning for Video Person Re-Identification