Abstract:Person re-identification aims at identifying a certain pedestrian across non-overlapping multi-camera networks in different time and places. Existing person re-identification approaches mainly focus on matching pedestrians on images; however, little attention has been paid to re-identify pedestrians in videos. Compared to images, video clips contain motion patterns of pedestrians, which is crucial to person re-identification. Moreover, consecutive video frames present pedestrian appearance with different body poses and from different viewpoints, providing valuable information toward addressing the challenge of pose variation, occlusion, and viewpoint change, and so on. In this article, we propose a Dense 3D-Convolutional Network (D3DNet) to jointly learn spatio-temporal and appearance representation for person re-identification in videos. The D3DNet consists of multiple three-dimensional (3D) dense blocks and transition layers. The 3D dense blocks enlarge the receptive fields of visual neurons in both spatial and temporal dimensions, leading to discriminative appearance representation as well as short-term and long-term motion patterns of pedestrians without the requirement of an additional motion estimation module. Moreover, we formulate a loss function consisting of an identification loss and a center loss to minimize intra-class variance and maximize inter-class variance simultaneously, toward addressing the challenge of large intra-class variance and small inter-class variance. Extensive experiments on two real-world video datasets of person identification, i.e., MARS and iLIDS-VID, have shown the effectiveness of the proposed approach.

3D PersonVLAD: Learning Deep Global Representations for Video-based Person Re-identification

Video-based Person Re-identification via 3D Convolutional Networks and Non-local Attention

Deep Recurrent Convolutional Networks for Video-based Person Re-identification: An End-to-End Approach

Few-Shot Deep Adversarial Learning for Video-based Person Re-identification

Spatial-Temporal Correlation and Topology Learning for Person Re-Identification in Videos

Deep Spatial-Temporal Fusion Network for Video-Based Person Re-identification.

Convolutional LSTM networks for video-based person re-identification

Video-based Person Re-identification with Long Short-Term Representation Learning

Dense 3D-Convolutional Neural Network for Person Re-Identification in Videos

Global-Local Temporal Representations For Video Person Re-Identification

Deep video-based person re-identification (Deep Vid-ReID): comprehensive survey

Multi-Scale 3D Convolution Network for Video Based Person Re-Identification.

Watching You: Global-guided Reciprocal Learning for Video-based Person Re-identification

Adaptive graph representation learning for video person re-identification

Video Person Re-Identification by Temporal Residual Learning

Multi-Scale Temporal Cues Learning for Video Person Re-Identification

Video-Based Person Re-identification by Deep Feature Guided Pooling

Video-based Person Re-identification via Self-Paced Learning and Deep Reinforcement Learning Framework.

Person Re-Identification Based on Deep Learning - an Overview

Iterative Local-Global Collaboration Learning Towards One-Shot Video Person Re-Identification.