Abstract:In this article, we propose a novel deep Siamese architecture based on a convolutional neural network (CNN) and multi-level similarity perception for the person re-identification (re-ID) problem. According to the distinct characteristics of diverse feature maps, we effectively apply different similarity constraints to both low-level and high-level feature maps during training stage. Due to the introduction of appropriate similarity comparison mechanisms at different levels, the proposed approach can adaptively learn discriminative local and global feature representations, respectively, while the former is more sensitive in localizing part-level prominent patterns relevant to re-identifying people across cameras. Meanwhile, a novel strong activation pooling strategy is utilized on the last convolutional layer for abstract local-feature aggregation to pursue more representative feature representations. Based on this, we propose final feature embedding by simultaneously encoding original global features and discriminative local features. In addition, our framework has two other benefits: First, classification constraints can be easily incorporated into the framework, forming a unified multi-task network with similarity constraints. Second, as similarity-comparable information has been encoded in the network’s learning parameters via back-propagation, pairwise input is not necessary at test time. That means we can extract features of each gallery image and build an index in an off-line manner, which is essential for large-scale real-world applications. Experimental results on multiple challenging benchmarks demonstrate that our method achieves splendid performance compared with the current state-of-the-art approaches.

Learning Intra-Video Difference for Person Re-Identification

Instance Hard Triplet Loss for In-video Person Re-identification

Deep Siamese Network with Multi-level Similarity Perception for Person Re-identification

A Loss Combination Based Deep Model for Person Re-Identification

Joining Features by Global Guidance with Bi-Relevance Trihard Loss for Person Re-Identification

Joint Uneven Channel Information Network with Blend Metric Loss for Person Re-Identification

Contribution-Based Multi-Stream Feature Distance Fusion Method with ${k}$ -Distribution Re-Ranking for Person Re-Identification

Dual Attention Matching Network for Context-Aware Feature Sequence based Person Re-Identification

Weighted Triple-Sequence Loss for Video-Based Person Re-Identification.

Effective Similarity Measurement for Video-based Person Re-identification

Beyond Triplet Loss: Person Re-Identification With Fine-Grained Difference-Aware Pairwise Loss

A Discriminatively Learned CNN Embedding for Person Reidentification

Multi-level Similarity Perception Network for Person Re-identification

Spatial-Temporal Correlation and Topology Learning for Person Re-Identification in Videos

Deep Recurrent Convolutional Networks for Video-based Person Re-identification: An End-to-End Approach

An Unbiased Temporal Representation for Video-Based Person Re-Identification

Feature separation and double causal comparison loss for visible and infrared person re-identification

Hierarchical Temporal Modeling With Mutual Distance Matching for Video Based Person Re-Identification

Person Re-Identification By Video Ranking

Video-Based Person Re-Identification Using Spatial-Temporal Memory Coupling Network

Learning Intra and Inter-Camera Invariance for Isolated Camera Supervised Person Re-identification