Abstract:Video-based person re-identification (Re-ID) aims at retrieving the person through the video sequences across non-overlapping cameras. Some characteristics of pedestrians are not consecutive across frames due to the variations of viewpoints, postures, and occlusions over time. However, existing methods ignore such data peculiarity and the networks tend to only learn those salient consecutive characteristics among frames in video sequences. As a result, the learned representations fail to cover all the characteristics of pedestrians, thus lacking integrity and discrimination. To tackle this problem, we present a novel deep architecture termed Hierarchical Mining Network (HMN), which mines as many pedestrians’ characteristics by referring to the temporal and intra-class knowledge. It consists of a novel Attentive Temporal Module (ATM) and a Dynamic Supervising Branch (DSB), with a Balancing Triplet Loss (BTL) assisting the training. The proposed ATM, with pedestrian perceiving capacity, is capable of evaluating each activation of features through temporal analysis, so that the temporally scattered characteristics of pedestrians can be better aggregated and the contaminated ones can be eliminated. Then, the DSB along with the BTL further enhances the integrity of representations by multiple supervision. Specifically, the DSB perceives the diversities of intra-class samples in each mini-batch and generates targeted supervising signals for them, in which process the BTL guarantees the signals with smaller intra-class variations and larger inter-class variations. Comprehensive experiments on two video-based datasets, i.e., MARS, and DukeMTMC-VideoReID, demonstrate the contribution of each component and the superiority of the proposed HMN over the state-of-the-arts. Benchmarking our model on three popular image-based datasets, i.e., Market1501, DukeMTMC-Reid, and MSMT17 additionally verifies the promising generalizability of the proposed DSB and BTL.

Hierarchical Integration of Rich Features for Video-Based Person Re-Identification.

Deep Siamese Network with Multi-level Similarity Perception for Person Re-identification

Joining Features by Global Guidance with Bi-Relevance Trihard Loss for Person Re-Identification

Person Re-identification Based on Transform Algorithm

Contribution-Based Multi-Stream Feature Distance Fusion Method with ${k}$ -Distribution Re-Ranking for Person Re-Identification

Deep Recurrent Convolutional Networks for Video-based Person Re-identification: An End-to-End Approach

Dual Attention Matching Network for Context-Aware Feature Sequence based Person Re-Identification

Spatial-Temporal Correlation and Topology Learning for Person Re-Identification in Videos

Video-based Person Re-identification via 3D Convolutional Networks and Non-local Attention

See The Forest For The Trees: Joint Spatial And Temporal Recurrent Neural Networks For Video-Based Person Re-Identification

Robust Video-Based Person Re-Identification by Hierarchical Mining

Multi-Level Fusion Temporal-Spatial Co-Attention for Video-Based Person Re-Identification

Hierarchical Temporal Modeling With Mutual Distance Matching for Video Based Person Re-Identification

Video Person Re-Identification by Temporal Residual Learning

Multi-scale Spatial-temporal Network for Person Re-identification

A Hierarchical Scheme for Video-Based Person Re-identification Using Lightweight PCANet and Handcrafted LOMO Features

Hierarchical Bi-Directional Feature Perception Network for Person Re-Identification

Discriminative feature extraction for video person re-identification via multi-task network

Person Re-Identification by Unsupervised Video Matching.

Deep Spatial-Temporal Fusion Network for Video-Based Person Re-identification.

Multi-Scale 3D Convolution Network for Video Based Person Re-Identification.