Abstract:Person re-identification aims at identifying a certain pedestrian across non-overlapping multi-camera networks in different time and places. Existing person re-identification approaches mainly focus on matching pedestrians on images; however, little attention has been paid to re-identify pedestrians in videos. Compared to images, video clips contain motion patterns of pedestrians, which is crucial to person re-identification. Moreover, consecutive video frames present pedestrian appearance with different body poses and from different viewpoints, providing valuable information toward addressing the challenge of pose variation, occlusion, and viewpoint change, and so on. In this article, we propose a Dense 3D-Convolutional Network (D3DNet) to jointly learn spatio-temporal and appearance representation for person re-identification in videos. The D3DNet consists of multiple three-dimensional (3D) dense blocks and transition layers. The 3D dense blocks enlarge the receptive fields of visual neurons in both spatial and temporal dimensions, leading to discriminative appearance representation as well as short-term and long-term motion patterns of pedestrians without the requirement of an additional motion estimation module. Moreover, we formulate a loss function consisting of an identification loss and a center loss to minimize intra-class variance and maximize inter-class variance simultaneously, toward addressing the challenge of large intra-class variance and small inter-class variance. Extensive experiments on two real-world video datasets of person identification, i.e., MARS and iLIDS-VID, have shown the effectiveness of the proposed approach.

Convolutional LSTM networks for video-based person re-identification

Deep Recurrent Convolutional Networks for Video-based Person Re-identification: An End-to-End Approach

Instance Hard Triplet Loss for In-video Person Re-identification

Person Re-identification Based on Transform Algorithm

Deep Siamese Network with Multi-level Similarity Perception for Person Re-identification

Joining Features by Global Guidance with Bi-Relevance Trihard Loss for Person Re-Identification

Spatial-Temporal Correlation and Topology Learning for Person Re-Identification in Videos

Multi-Scale 3D Convolution Network for Video Based Person Re-Identification.

Dense 3D-Convolutional Neural Network for Person Re-Identification in Videos

Video-based Person Re-identification with Two-stream Convolutional Network and Co-attentive Snippet Embedding

Deeply-Coupled Convolution-Transformer with Spatial-temporal Complementary Learning for Video-based Person Re-identification

Temporal Complementarity-Guided Reinforcement Learning for Image-to-Video Person Re-Identification

Three-Stream Convolutional Networks for Video-based Person Re-Identification.

An Unbiased Temporal Representation for Video-Based Person Re-Identification

Video-based Person Re-identification with Long Short-Term Representation Learning

Video-Based Person Re-Identification Using Spatial-Temporal Memory Coupling Network

Multi-Level Fusion Temporal-Spatial Co-Attention for Video-Based Person Re-Identification

AA-RGTCN: Reciprocal Global Temporal Convolution Network with Adaptive Alignment for Video-Based Person Re-Identification

Person Re-Identification by Unsupervised Video Matching.

Person Re-Identification By Video Ranking

Learning Compact Appearance Representation for Video-Based Person Re-Identification