An Unbiased Temporal Representation for Video-Based Person Re-Identification

Xiu Zhang,Bir Bhanu
DOI: https://doi.org/10.1109/icip.2018.8451518
2018-01-01
Abstract:Person re-identification (re-id) aims to associate pedestrians across different camera views. As compared to the still image-based re-id, video-based re-id provides not only the spatial information but also the temporal dependency among frames. Most of the existing works apply the convolutional neural networks as a spatial feature extractor and then use backpropagation through time (BPTT) to train recurrent neural networks for temporal information. However, the long-term dependency is very hard to learn in RNNs via BPTT due to gradient vanishing or exploding. In the re-id task, the long-term dependency is quite common since the key information (iden-tity of the pedestrian) exists most of the time along the given sequence. Thus, the importance of a frame should not be determined by its position in a sequence, which is usually biased in state-of-the-art models with RNNs. In this paper, we argue that long-term dependency can be very important and propose an unbiased siamese recurrent convolutional neural network architecture to model and associate pedestrians in a video. Experimental results on two public datasets demonstrate the effectiveness of the proposed method.
What problem does this paper attempt to address?