Abstract:Person re-identification aims at identifying a certain pedestrian across non-overlapping multi-camera networks in different time and places. Existing person re-identification approaches mainly focus on matching pedestrians on images; however, little attention has been paid to re-identify pedestrians in videos. Compared to images, video clips contain motion patterns of pedestrians, which is crucial to person re-identification. Moreover, consecutive video frames present pedestrian appearance with different body poses and from different viewpoints, providing valuable information toward addressing the challenge of pose variation, occlusion, and viewpoint change, and so on. In this article, we propose a Dense 3D-Convolutional Network (D3DNet) to jointly learn spatio-temporal and appearance representation for person re-identification in videos. The D3DNet consists of multiple three-dimensional (3D) dense blocks and transition layers. The 3D dense blocks enlarge the receptive fields of visual neurons in both spatial and temporal dimensions, leading to discriminative appearance representation as well as short-term and long-term motion patterns of pedestrians without the requirement of an additional motion estimation module. Moreover, we formulate a loss function consisting of an identification loss and a center loss to minimize intra-class variance and maximize inter-class variance simultaneously, toward addressing the challenge of large intra-class variance and small inter-class variance. Extensive experiments on two real-world video datasets of person identification, i.e., MARS and iLIDS-VID, have shown the effectiveness of the proposed approach.

Learning Recurrent 3D Attention for Video-Based Person Re-Identification

Spatial-Temporal Attention-aware Learning for Video-based Person Re-identification.

Temporal Regularized Spatial Attention for Video-Based Person Re-Identification.

Spatial-Temporal Correlation and Topology Learning for Person Re-Identification in Videos

Triplet Attention Network for Video-Based Person Re-Identification

Video-based Person Re-identification via 3D Convolutional Networks and Non-local Attention

Multi-layer Attention for Person Re-Identification

A Multi-Scale Spatial-Temporal Attention Model for Person Re-Identification in Videos

Dense 3D-Convolutional Neural Network for Person Re-Identification in Videos

Temporal Complementarity-Guided Reinforcement Learning for Image-to-Video Person Re-Identification

Recurrent Models of Visual Co-Attention for Person Re-Identification

Pose-Aided Video-based Person Re-Identification via Recurrent Graph Convolutional Network

ASTA-Net: Adaptive Spatio-Temporal Attention Network for Person Re-Identification in Videos.

Dual Attention Matching Network for Context-Aware Feature Sequence based Person Re-Identification

Diverse Part Attentive Network for Video-Based Person Re-Identification *

AA-RGTCN: Reciprocal Global Temporal Convolution Network with Adaptive Alignment for Video-Based Person Re-Identification

Complex spatial-temporal attention aggregation for video person re-identification

Multi-Scale Temporal Cues Learning for Video Person Re-Identification

Parallel Attention with Weighted Efficient Network for Video-Based Person Re-Identification.

Relation-Guided Spatial Attention and Temporal Refinement for Video-Based Person Re-Identification.

Video-based Person Re-Identification Via Spatio-Temporal Attentional and Two-Stream Fusion Convolutional Networks