Abstract:Employing attention mechanisms to model both global and local features as a final pedestrian representation has become a trend for person re-identification (Re-ID) algorithms. A potential limitation of these methods is that they focus on the most salient features, but the re-identification of a person may rely on diverse clues masked by the most salient features in different situations, e.g., body, clothes or even shoes. To handle this limitation, we propose a novel Salience-guided Cascaded Suppression Network (SCSN) which enables the model to mine diverse salient features and integrate these features into the final representation by a cascaded manner. Our work makes the following contributions: (i) We observe that the previously learned salient features may hinder the network from learning other important information. To tackle this limitation, we introduce a cascaded suppression strategy, which enables the network to mine diverse potential useful features that be masked by the other salient features stage-by-stage and each stage integrates different feature embedding for the last discriminative pedestrian representation. (ii) We propose a Salient Feature Extraction (SFE) unit, which can suppress the salient features learned in the previous cascaded stage and then adaptively extracts other potential salient feature to obtain different clues of pedestrians. (iii) We develop an efficient feature aggregation strategy that fully increases the network's capacity for all potential salience features. Finally, experimental results demonstrate that our proposed method outperforms the state-of-the-art methods on four large-scale datasets. Especially, our approach exceeds the current best method by over 7% on the CUHK03 dataset.

Co-Saliency Spatio-Temporal Interaction Network for Person Re-Identification in Videos

A Novel Two-Stream Saliency Image Fusion CNN Architecture for Person Re-Identification

Spatial-Temporal Correlation and Topology Learning for Person Re-Identification in Videos

Deep Siamese Network with Multi-level Similarity Perception for Person Re-identification

MSTN: A Multi-granular Spatial–Temporal Network for video-based person re-identification

Contribution-Based Multi-Stream Feature Distance Fusion Method with ${k}$ -Distribution Re-Ranking for Person Re-Identification

Person Re-identification Network Based on Multi-Level Feature Fusion

Temporal Complementarity-Guided Reinforcement Learning for Image-to-Video Person Re-Identification

Salience-Guided Cascaded Suppression Network for Person Re-identification

Dual Attention Matching Network for Context-Aware Feature Sequence based Person Re-Identification

Temporal Attribute-Appearance Learning Network for Video-based Person Re-Identification

Saliency and Granularity: Discovering Temporal Coherence for Video-Based Person Re-Identification

Dense 3D-Convolutional Neural Network for Person Re-Identification in Videos

Event-Guided Person Re-Identification Via Sparse-Dense Complementary Learning

Context Sensing Attention Network for Video-based Person Re-identification

Video-Based Person Re-Identification Using Spatial-Temporal Memory Coupling Network

Discriminative Spatial Feature Learning for Person Re-Identification

Temporal Complementary Learning for Video Person Re-Identification

ASTA-Net: Adaptive Spatio-Temporal Attention Network for Person Re-Identification in Videos.

Learning Visual-Spatial Saliency for Multiple-Shot Person Re-Identification

BiCnet-TKS: Learning Efficient Spatial-Temporal Representation for Video Person Re-Identification